• Open

    Differentially Private Transferrable Deep Learning with Membership-Mappings. (arXiv:2105.04615v6 [cs.LG] UPDATED)
    This paper considers the problem of differentially private semi-supervised transfer and multi-task learning. The notion of \emph{membership-mapping} has been developed using measure theory basis to learn data representation via a fuzzy membership function. An alternative conception of deep autoencoder, referred to as \emph{Conditionally Deep Membership-Mapping Autoencoder (CDMMA)}, is considered for transferrable deep learning. Under practice-oriented settings, an analytical solution for the learning of CDMMA can be derived by means of variational optimization. The paper proposes a transfer and multi-task learning approach that combines CDMMA with a tailored noise adding mechanism to achieve a given level of privacy-loss bound with the minimum perturbation of the data. Numerous experiments were carried out using MNIST, USPS, Office, and Caltech256 datasets to verify the competitive robust performance of the proposed methodology.  ( 2 min )
    DynG2G: An Efficient Stochastic Graph Embedding Method for Temporal Graphs. (arXiv:2109.13441v2 [cs.LG] UPDATED)
    Dynamic graph embedding has gained great attention recently due to its capability of learning low dimensional graph representations for complex temporal graphs with high accuracy. However, recent advances mostly focus on learning node embeddings as deterministic "vectors" for static graphs yet disregarding the key graph temporal dynamics and the evolving uncertainties associated with node embedding in the latent space. In this work, we propose an efficient stochastic dynamic graph embedding method (DynG2G) that applies an inductive feed-forward encoder trained with node triplet-based contrastive loss. Every node per timestamp is encoded as a time-dependent probabilistic multivariate Gaussian distribution in the latent space, hence we can quantify the node embedding uncertainty on-the-fly. We adopted eight different benchmarks that represent diversity in size (from 96 nodes to 87,626 and from 13,398 edges to 4,870,863) and diversity in dynamics. We demonstrate via extensive experiments on these eight dynamic graph benchmarks that DynG2G achieves new state-of-the-art performance in capturing the underlying temporal node embeddings. We also demonstrate that DynG2G can predict the evolving node embedding uncertainty, which plays a crucial role in quantifying the intrinsic dimensionality of the dynamical system over time. We obtain a universal relation of the optimal embedding dimension, $L_o$, versus the effective dimensionality of uncertainty, $D_u$, and we infer that $L_o=D_u$ for all cases. This implies that the uncertainty quantification approach we employ in the DynG2G correctly captures the intrinsic dimensionality of the dynamics of such evolving graphs despite the diverse nature and composition of the graphs at each timestamp. Moreover, this $L_0 - D_u$ correlation provides a clear path to select adaptively the optimum embedding size at each timestamp by setting $L \ge D_u$.  ( 2 min )
    Tracking Most Significant Arm Switches in Bandits. (arXiv:2112.13838v5 [cs.LG] UPDATED)
    In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually severe. To resolve this question, we propose a new notion of significant shift, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of total variation $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.  ( 2 min )
    Monte Carlo Tree Search: A Review of Recent Modifications and Applications. (arXiv:2103.04931v3 [cs.AI] UPDATED)
    Monte Carlo Tree Search (MCTS) is a powerful approach to designing game-playing bots or solving sequential decision problems. The method relies on intelligent tree search that balances exploration and exploitation. MCTS performs random sampling in the form of simulations and stores statistics of actions to make more educated choices in each subsequent iteration. The method has become a state-of-the-art technique for combinatorial games, however, in more complex games (e.g. those with high branching factor or real-time ones), as well as in various practical domains (e.g. transportation, scheduling or security) an efficient MCTS application often requires its problem-dependent modification or integration with other techniques. Such domain-specific modifications and hybrid approaches are the main focus of this survey. The last major MCTS survey has been published in 2012. Contributions that appeared since its release are of particular interest for this review.  ( 2 min )
    COSTI: a New Classifier for Sequences of Temporal Intervals. (arXiv:2204.13467v1 [cs.LG])
    Classification of sequences of temporal intervals is a part of time series analysis which concerns series of events. We propose a new method of transforming the problem to a task of multivariate series classification. We use one of the state-of-the-art algorithms from the latter domain on the new representation to obtain significantly better accuracy than the state-of-the-art methods from the former field. We discuss limitations of this workflow and address them by developing a novel method for classification termed COSTI (short for Classification of Sequences of Temporal Intervals) operating directly on sequences of temporal intervals. The proposed method remains at a high level of accuracy and obtains better performance while avoiding shortcomings connected to operating on transformed data. We propose a generalized version of the problem of classification of temporal intervals, where each event is supplemented with information about its intensity. We also provide two new data sets where this information is of substantial value.  ( 2 min )
    Real-time Outdoor Localization Using Radio Maps: A Deep Learning Approach. (arXiv:2106.12556v2 [cs.LG] UPDATED)
    This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between the devices and the satellites is low, and thus alternative localization methods are required for good accuracy. We present LocUNet: A fully convolutional, end-to-end trained neural network for the localization task, which merely depends on the received signal strengths (RSS) from Base Stations (BSs).In a wireless network, user devices scan the base station beacon slots and identify the few strongest base station signals for handover and user-base station association purposes. In the proposed method, the user to be localized simply reports such received signal strengths to a central processing unit, which may be located in the cloud. Alternatively, the localization can be performed locally at the user. Using the pathloss radio map estimations and the RSS measurements, LocUNet can localize users with state-of-the-art accuracy and enjoys high robustness to inaccuracies in the estimations of the radio maps. The proposed method does not require pre-sampling of the environment; and is suitable for real-time applications, thanks to the RadioUNet, a neural network-based radio map estimator. Moreover, two novel datasets that allow for numerical evaluations of RSS and ToA methods in realistic urban environments are presented and set publicly available for the use of research community. By using these datasets, we also provided a fair comparison of state-of-the-art RSS and ToA-based methods in the dense urban scenario, LocUNet outperforming all the compared methods.  ( 2 min )
    Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy. (arXiv:2101.07564v2 [stat.ML] UPDATED)
    We analyse the performance of several iterative algorithms for the quantisation of a probability measure $\mu$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $\mu$ being uniform (a space-filling design application) and with $\mu$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.  ( 2 min )
    A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v4 [stat.ML] UPDATED)
    Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.  ( 2 min )
    Reappraising Domain Generalization in Neural Networks. (arXiv:2110.07981v3 [cs.LG] UPDATED)
    Given that Neural Networks generalize unreasonably well in the IID setting (with benign overfitting and betterment in performance with more parameters), OOD presents a consistent failure case to better the understanding of how they learn. This paper focuses on Domain Generalization (DG), which is perceived as the front face of OOD generalization. We find that the presence of multiple domains incentivizes domain agnostic learning and is the primary reason for generalization in Tradition DG. We show that the state-of-the-art results can be obtained by borrowing ideas from IID generalization and the DG tailored methods fail to add any performance gains. Furthermore, we perform explorations beyond the Traditional DG (TDG) formulation and propose a novel ClassWise DG (CWDG) benchmark, where for each class, we randomly select one of the domains and keep it aside for testing. Despite being exposed to all domains during training, CWDG is more challenging than TDG evaluation. We propose a novel iterative domain feature masking approach, achieving state-of-the-art results on the CWDG benchmark. Overall, while explaining these observations, our work furthers insights into the learning mechanisms of neural networks.  ( 2 min )
    High Dimensional Quantum Machine Learning With Small Quantum Computers. (arXiv:2203.13739v2 [quant-ph] UPDATED)
    Quantum computers hold great promise to enhance machine learning, but their current qubit counts restrict the realisation of this promise. In an attempt to placate this limitation techniques can be applied for evaluating a quantum circuit using a machine with fewer qubits than the circuit naively requires. These techniques work by evaluating many smaller circuits on the smaller machine, that are then combined in a polynomial to replicate the output of the larger machine. This scheme requires more circuit evaluations than are practical for general circuits. However, we investigate the possibility that for certain applications many of these subcircuits are superfluous, and that a much smaller sum is sufficient to estimate the full circuit. We construct a machine learning model that may be capable of approximating the outputs of the larger circuit with much fewer circuit evaluations. We successfully apply our model to the task of digit recognition, using simulated quantum computers much smaller than the data dimension. The model is also applied to the task of approximating a random 10 qubit PQC with simulated access to a 5 qubit computer, even with only relatively modest number of circuits our model provides an accurate approximation of the 10 qubit PQCs output, superior to a neural network attempt. The developed method might be useful for implementing quantum models on larger data throughout the NISQ era.  ( 2 min )
    Curriculum Learning for Dense Retrieval Distillation. (arXiv:2204.13679v1 [cs.IR])
    Recent work has shown that more effective dense retrieval models can be obtained by distilling ranking knowledge from an existing base re-ranking model. In this paper, we propose a generic curriculum learning based optimization framework called CL-DRD that controls the difficulty level of training data produced by the re-ranking (teacher) model. CL-DRD iteratively optimizes the dense retrieval (student) model by increasing the difficulty of the knowledge distillation data made available to it. In more detail, we initially provide the student model coarse-grained preference pairs between documents in the teacher's ranking and progressively move towards finer-grained pairwise document ordering requirements. In our experiments, we apply a simple implementation of the CL-DRD framework to enhance two state-of-the-art dense retrieval models. Experiments on three public passage retrieval datasets demonstrate the effectiveness of our proposed framework.  ( 2 min )
    Unified Simulation, Perception, and Generation of Human Behavior. (arXiv:2204.13678v1 [cs.CV])
    Understanding and modeling human behavior is fundamental to almost any computer vision and robotics applications that involve humans. In this thesis, we take a holistic approach to human behavior modeling and tackle its three essential aspects -- simulation, perception, and generation. Throughout the thesis, we show how the three aspects are deeply connected and how utilizing and improving one aspect can greatly benefit the other aspects. We also discuss the lessons learned and our vision for what is next for human behavior modeling.
    Signal Recovery with Non-Expansive Generative Network Priors. (arXiv:2204.13599v1 [eess.SP])
    We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed allowing for networks with contractive layers, as often the case of real generators. In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from a few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality, which we term Range Restricted Weight Distribution Condition (R2WDC), and weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based on. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.
    Should Machine Learning Models Report to Us When They Are Clueless?. (arXiv:2203.12131v2 [cs.LG] UPDATED)
    The right to AI explainability has consolidated as a consensus in the research community and policy-making. However, a key component of explainability has been missing: extrapolation, which describes the extent to which AI models can be clueless when they encounter unfamiliar samples (i.e., samples outside the convex hull of their training sets, as we will explain). We report that AI models extrapolate outside their range of familiar data, frequently and without notifying the users and stakeholders. Knowing whether a model has extrapolated or not is a fundamental insight that should be included in explaining AI models in favor of transparency and accountability. Instead of dwelling on the negatives, we offer ways to clear the roadblocks in promoting AI transparency. Our analysis commentary accompanying practical clauses useful to include in AI regulations such as the National AI Initiative Act in the US and the AI Act by the European Commission.
    Revisiting Bayesian Autoencoders with MCMC. (arXiv:2104.05915v2 [cs.LG] UPDATED)
    Autoencoders gained popularity in the deep learning revolution given their ability to compress data and provide dimensionality reduction. Although prominent deep learning methods have been used to enhance autoencoders, the need to provide robust uncertainty quantification remains a challenge. This has been addressed with variational autoencoders so far. Bayesian inference via Markov Chain Monte Carlo (MCMC) sampling has faced several limitations for large models; however, recent advances in parallel computing and advanced proposal schemes have opened routes less traveled. This paper presents Bayesian autoencoders powered by MCMC sampling implemented using parallel computing and Langevin-gradient proposal distribution. The results indicate that the proposed Bayesian autoencoder provides similar performance accuracy when compared to related methods in the literature. Furthermore, it provides uncertainty quantification in the reduced data representation. This motivates further applications of the Bayesian autoencoder framework for other deep learning models.
    Model architecture can transform catastrophic forgetting into positive transfer. (arXiv:2108.03940v3 [cs.LG] UPDATED)
    The work of McCloskey and Cohen popularized the concept of catastrophic interference. They used a neural network that tried to learn addition using two groups of examples as two different tasks. In their case, learning the second task rapidly deteriorated the acquired knowledge about the previous one. We hypothesize that this could be a symptom of a fundamental problem: addition is an algorithmic task that should not be learned through pattern recognition. Therefore, other model architectures better suited for this task would avoid catastrophic forgetting. We use a neural network with a different architecture that can be trained to recover the correct algorithm for the addition of binary numbers. This neural network includes conditional clauses that are naturally treated within the back-propagation algorithm. We test it in the setting proposed by McCloskey and Cohen and training on random additions one by one. The neural network not only does not suffer from catastrophic forgetting but it improves its predictive power on unseen pairs of numbers as training progresses. We also show that this is a robust effect, also present when averaging many simulations. This work emphasizes the importance that neural network architecture has for the emergence of catastrophic forgetting and introduces a neural network that is able to learn an algorithm.
    Ultrasound Shear Wave Elasticity Imaging with Spatio-Temporal Deep Learning. (arXiv:2204.05745v2 [eess.IV] UPDATED)
    Ultrasound shear wave elasticity imaging is a valuable tool for quantifying the elastic properties of tissue. Typically, the shear wave velocity is derived and mapped to an elasticity value, which neglects information such as the shape of the propagating shear wave or push sequence characteristics. We present 3D spatio-temporal CNNs for fast local elasticity estimation from ultrasound data. This approach is based on retrieving elastic properties from shear wave propagation within small local regions. A large training data set is acquired with a robot from homogeneous gelatin phantoms ranging from 17.42 kPa to 126.05 kPa with various push locations. The results show that our approach can estimate elastic properties on a pixelwise basis with a mean absolute error of 5.01+-4.37 kPa. Furthermore, we estimate local elasticity independent of the push location and can even perform accurate estimates inside the push region. For phantoms with embedded inclusions, we report a 53.93% lower MAE (7.50 kPa) and on the background of 85.24% (1.64 kPa) compared to a conventional shear wave method. Overall, our method offers fast local estimations of elastic properties with small spatio-temporal window sizes.
    Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks. (arXiv:2204.10496v2 [cs.CV] UPDATED)
    Cross-modal encoders for vision-language (VL) tasks are often pretrained with carefully curated vision-language datasets. While these datasets reach an order of 10 million samples, the labor cost is prohibitive to scale further. Conversely, unimodal encoders are pretrained with simpler annotations that are less cost-prohibitive, achieving scales of hundreds of millions to billions. As a result, unimodal encoders have achieved state-of-art (SOTA) on many downstream tasks. However, challenges remain when applying to VL tasks. The pretraining data is not optimal for cross-modal architectures and requires heavy computational resources. In addition, unimodal architectures lack cross-modal interactions that have demonstrated significant benefits for VL tasks. Therefore, how to best leverage pretrained unimodal encoders for VL tasks is still an area of active research. In this work, we propose a method to leverage unimodal vision and text encoders for VL tasks that augment existing VL approaches while conserving computational complexity. Specifically, we propose Multimodal Adaptive Distillation (MAD), which adaptively distills useful knowledge from pretrained encoders to cross-modal VL encoders. Second, to better capture nuanced impacts on VL task performance, we introduce an evaluation protocol that includes Visual Commonsense Reasoning (VCR), Visual Entailment (SNLI-VE), and Visual Question Answering (VQA), across a variety of data constraints and conditions of domain shift. Experiments demonstrate that MAD leads to consistent gains in the low-shot, domain-shifted, and fully-supervised conditions on VCR, SNLI-VE, and VQA, achieving SOTA performance on VCR compared to other single models pretrained with image-text data. Finally, MAD outperforms concurrent works utilizing pretrained vision encoder from CLIP. Code will be made available.
    Cooperative Multi-Agent Reinforcement Learning with Hypergraph Convolution. (arXiv:2112.06771v2 [cs.AI] UPDATED)
    Recent years have witnessed the great success of multi-agent systems (MAS). Value decomposition, which decomposes joint action values into individual action values, has been an important work in MAS. However, many value decomposition methods ignore the coordination among different agents, leading to the notorious "lazy agents" problem. To enhance the coordination in MAS, this paper proposes HyperGraph CoNvolution MIX (HGCN-MIX), a method that incorporates hypergraph convolution with value decomposition. HGCN-MIX models agents as well as their relationships as a hypergraph, where agents are nodes and hyperedges among nodes indicate that the corresponding agents can coordinate to achieve larger rewards. Then, it trains a hypergraph that can capture the collaborative relationships among agents. Leveraging the learned hypergraph to consider how other agents' observations and actions affect their decisions, the agents in a MAS can better coordinate. We evaluate HGCN-MIX in the StarCraft II multi-agent challenge benchmark. The experimental results demonstrate that HGCN-MIX can train joint policies that outperform or achieve a similar level of performance as the current state-of-the-art techniques. We also observe that HGCN-MIX has an even more significant improvement of performance in the scenarios with a large amount of agents. Besides, we conduct additional analysis to emphasize that when the hypergraph learns more relationships, HGCN-MIX can train stronger joint policies.
    An Explainable Regression Framework for Predicting Remaining Useful Life of Machines. (arXiv:2204.13574v1 [cs.LG])
    Prediction of a machine's Remaining Useful Life (RUL) is one of the key tasks in predictive maintenance. The task is treated as a regression problem where Machine Learning (ML) algorithms are used to predict the RUL of machine components. These ML algorithms are generally used as a black box with a total focus on the performance without identifying the potential causes behind the algorithms' decisions and their working mechanism. We believe, the performance (in terms of Mean Squared Error (MSE), etc.,) alone is not enough to build the trust of the stakeholders in ML prediction rather more insights on the causes behind the predictions are needed. To this aim, in this paper, we explore the potential of Explainable AI (XAI) techniques by proposing an explainable regression framework for the prediction of machines' RUL. We also evaluate several ML algorithms including classical and Neural Networks (NNs) based solutions for the task. For the explanations, we rely on two model agnostic XAI methods namely Local Interpretable Model-Agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP). We believe, this work will provide a baseline for future research in the domain.
    Structured Pruning Learns Compact and Accurate Models. (arXiv:2204.00408v2 [cs.CL] UPDATED)
    The growing size of neural language models has led to increased attention in model compression. The two predominant approaches are pruning, which gradually removes weights from a pre-trained model, and distillation, which trains a smaller compact model to match a larger one. Pruning methods can significantly reduce the model size but hardly achieve large speedups as distillation. However, distillation methods require large amounts of unlabeled data and are expensive to train. In this work, we propose a task-specific structured pruning method CoFi (Coarse- and Fine-grained Pruning), which delivers highly parallelizable subnetworks and matches the distillation methods in both accuracy and latency, without resorting to any unlabeled data. Our key insight is to jointly prune coarse-grained (e.g., layers) and fine-grained (e.g., heads and hidden units) modules, which controls the pruning decision of each parameter with masks of different granularity. We also devise a layerwise distillation strategy to transfer knowledge from unpruned to pruned models during optimization. Our experiments on GLUE and SQuAD datasets show that CoFi yields models with over 10x speedups with a small accuracy drop, showing its effectiveness and efficiency compared to previous pruning and distillation approaches.
    Scan Specific Artifact Reduction in K-space (SPARK) Neural Networks Synergize with Physics-based Reconstruction to Accelerate MRI. (arXiv:2104.01188v3 [eess.SP] UPDATED)
    Purpose: To develop a scan-specific model that estimates and corrects k-space errors made when reconstructing accelerated Magnetic Resonance Imaging (MRI) data. Methods: Scan-Specific Artifact Reduction in k-space (SPARK) trains a convolutional-neural-network to estimate and correct k-space errors made by an input reconstruction technique by back-propagating from the mean-squared-error loss between an auto-calibration signal (ACS) and the input technique's reconstructed ACS. First, SPARK is applied to GRAPPA and demonstrates improved robustness over other scan-specific models, such as RAKI and residual-RAKI. Subsequent experiments demonstrate that SPARK synergizes with residual-RAKI to improve reconstruction performance. SPARK also improves reconstruction quality when applied to advanced acquisition and reconstruction techniques like 2D virtual coil (VC-) GRAPPA, 2D LORAKS, 3D GRAPPA without an integrated ACS region, and 2D/3D wave-encoded images. Results: SPARK yields 1.5x - 2x RMSE reduction when applied to GRAPPA and improves robustness to ACS size for various acceleration rates in comparison to other scan-specific techniques. When applied to advanced reconstruction techniques such as residual-RAKI, 2D VC-GRAPPA and LORAKS, SPARK achieves up to 20% RMSE improvement. SPARK with 3D GRAPPA also improves performance by ~2x and perceived image quality without a fully sampled ACS region. Finally, SPARK synergizes with non-cartesian 2D and 3D wave-encoding imaging by reducing RMSE between 20-25% and providing qualitative improvements. Conclusion: SPARK synergizes with physics-based acquisition and reconstruction techniques to improve accelerated MRI by training scan-specific models to estimate and correct reconstruction errors in k-space.
    Standardized Evaluation of Machine Learning Methods for Evolving Data Streams. (arXiv:2204.13625v1 [cs.LG])
    Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work therefore often draws on different heuristics and simulations that do not necessarily produce meaningful and reliable results. Indeed, in the absence of common evaluation standards, it often remains unclear how online learning methods will perform in practice or in comparison to similar work. In this paper, we propose a comprehensive set of properties for high-quality machine learning in evolving data streams. In particular, we discuss sensible performance measures and evaluation strategies for online predictive modelling, online feature selection and concept drift detection. As one of the first works, we also look at the interpretability of online learning methods. The proposed evaluation standards are provided in a new Python framework called float. Float is completely modular and allows the simultaneous integration of common libraries, such as scikit-multiflow or river, with custom code. Float is open-sourced and can be accessed at https://github.com/haugjo/float. In this sense, we hope that our work will contribute to more standardized, reliable and realistic testing and comparison of online machine learning methods.
    Representative period selection for power system planning using autoencoder-based dimensionality reduction. (arXiv:2204.13608v1 [cs.LG])
    Power sector capacity expansion models (CEMs) that are used for studying future low-carbon grid scenarios must incorporate detailed representation of grid operations. Often CEMs are formulated to model grid operations over representative periods that are sampled from the original input data using clustering algorithms. However, such representative period selection (RPS) methods are limited by the declining efficacy of the clustering algorithm with increasing dimensionality of the input data and do not consider the relative importance of input data variations on CEM outcomes. Here, we propose a RPS method that addresses these limitations by incorporating dimensionality reduction, accomplished via neural network based autoencoders, prior to clustering. Such dimensionality reduction not only improves the performance of the clustering algorithm, but also facilitates using additional features, such as estimated outputs produced from parallel solutions of simplified versions of the CEM for each disjoint period in the input data (e.g. 1 week). The impact of incorporating dimensionality reduction as part of RPS methods is quantified through the error in outcomes of the corresponding reduced-space CEM vs. the full space CEM. Extensive numerical experimentation across various networks and range of technology and policy scenarios establish the superiority of the dimensionality-reduction based RPS methods.
    Toward Compositional Generalization in Object-Oriented World Modeling. (arXiv:2204.13661v1 [cs.LG])
    Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact} or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.
    DOTIN: Dropping Task-Irrelevant Nodes for GNNs. (arXiv:2204.13429v1 [cs.LG])
    Scalability is an important consideration for deep graph neural networks. Inspired by the conventional pooling layers in CNNs, many recent graph learning approaches have introduced the pooling strategy to reduce the size of graphs for learning, such that the scalability and efficiency can be improved. However, these pooling-based methods are mainly tailored to a single graph-level task and pay more attention to local information, limiting their performance in multi-task settings which often require task-specific global information. In this paper, departure from these pooling-based efforts, we design a new approach called DOTIN (\underline{D}r\underline{o}pping \underline{T}ask-\underline{I}rrelevant \underline{N}odes) to reduce the size of graphs. Specifically, by introducing $K$ learnable virtual nodes to represent the graph embeddings targeted to $K$ different graph-level tasks, respectively, up to 90\% raw nodes with low attentiveness with an attention model -- a transformer in this paper, can be adaptively dropped without notable performance decreasing. Achieving almost the same accuracy, our method speeds up GAT by about 50\% on graph-level tasks including graph classification and graph edit distance (GED) with about 60\% less memory, on D\&D dataset. Code will be made publicly available in https://github.com/Sherrylone/DOTIN.
    Prescriptive and Descriptive Approaches to Machine-Learning Transparency. (arXiv:2204.13582v1 [cs.SE])
    Specialized documentation techniques have been developed to communicate key facts about machine-learning (ML) systems and the datasets and models they rely on. Techniques such as Datasheets, FactSheets, and Model Cards have taken a mainly descriptive approach, providing various details about the system components. While the above information is essential for product developers and external experts to assess whether the ML system meets their requirements, other stakeholders might find it less actionable. In particular, ML engineers need guidance on how to mitigate potential shortcomings in order to fix bugs or improve the system's performance. We survey approaches that aim to provide such guidance in a prescriptive way. We further propose a preliminary approach, called Method Cards, which aims to increase the transparency and reproducibility of ML systems by providing prescriptive documentation of commonly-used ML methods and techniques. We showcase our proposal with an example in small object detection, and demonstrate how Method Cards can communicate key considerations for model developers. We further highlight avenues for improving the user experience of ML engineers based on Method Cards.
    Federated Learning on Heterogeneous and Long-Tailed Data via Classifier Re-Training with Federated Features. (arXiv:2204.13399v1 [cs.LG])
    Federated learning (FL) provides a privacy-preserving solution for distributed machine learning tasks. One challenging problem that severely damages the performance of FL models is the co-occurrence of data heterogeneity and long-tail distribution, which frequently appears in real FL applications. In this paper, we reveal an intriguing fact that the biased classifier is the primary factor leading to the poor performance of the global model. Motivated by the above finding, we propose a novel and privacy-preserving FL method for heterogeneous and long-tailed data via Classifier Re-training with Federated Features (CReFF). The classifier re-trained on federated features can produce comparable performance as the one re-trained on real data in a privacy-preserving manner without information leakage of local data or class distribution. Experiments on several benchmark datasets show that the proposed CReFF is an effective solution to obtain a promising FL model under heterogeneous and long-tailed data. Comparative results with the state-of-the-art FL methods also validate the superiority of CReFF. Our code is available at https://github.com/shangxinyi/CReFF-FL.
    Worst-Case Dynamic Power Distribution Network Noise Prediction Using Convolutional Neural Network. (arXiv:2204.13109v1 [cs.LG])
    Worst-case dynamic PDN noise analysis is an essential step in PDN sign-off to ensure the performance and reliability of chips. However, with the growing PDN size and increasing scenarios to be validated, it becomes very time- and resource-consuming to conduct full-stack PDN simulation to check the worst-case noise for different test vectors. Recently, various works have proposed machine learning based methods for supply noise prediction, many of which still suffer from large training overhead, inefficiency, or non-scalability. Thus, this paper proposed an efficient and scalable framework for the worst-case dynamic PDN noise prediction. The framework first reduces the spatial and temporal redundancy in the PDN and input current vector, and then employs efficient feature extraction as well as a novel convolutional neural network architecture to predict the worst-case dynamic PDN noise. Experimental results show that the proposed framework consistently outperforms the commercial tool and the state-of-the-art machine learning method with only 0.63-1.02% mean relative error and 25-69$\times$ speedup.
    Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework. (arXiv:2204.13207v1 [cs.CV])
    Current contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations, which limits the efficacy on unseen data and downstream tasks. In this paper, we present a hierarchical multi-label representation learning framework that can leverage all available labels and preserve the hierarchical relationship between classes. We introduce novel hierarchy preserving losses, which jointly apply a hierarchical penalty to the contrastive loss, and enforce the hierarchy constraint. The loss function is data driven and automatically adapts to arbitrary multi-label structures. Experiments on several datasets show that our relationship-preserving embedding performs well on a variety of tasks and outperform the baseline supervised and self-supervised approaches. Code is available at https://github.com/salesforce/hierarchicalContrastiveLearning.
    Neural network controllers for uncertain linear systems. (arXiv:2204.13209v1 [eess.SY])
    We consider the design of reliable neural network (NN)-based approximations of traditional stabilizing controllers for linear systems affected by polytopic uncertainty, including controllers with variable structure and those based on a minimal selection policy. We develop a systematic procedure to certify the closed-loop stability and performance of a polytopic system when a rectified linear unit (ReLU)-based approximation replaces such traditional controllers. We provide sufficient conditions to ensure stability involving the worst-case approximation error and the Lipschitz constant characterizing the error function between ReLU-based and traditional controller-based state-to-input mappings, and further provide offline, mixed-integer optimization-based methods that allow us to compute those quantities exactly.
    Epileptic Seizure Classification Using Combined Labels and a Genetic Algorithm. (arXiv:2110.01742v4 [eess.SP] UPDATED)
    Epilepsy affects 50 million people worldwide and is one of the most common serious neurological disorders. Seizure detection and classification is a valuable tool for diagnosing and maintaining the condition. An automated classification algorithm will allow for accurate diagnosis. Utilising the Temple University Hospital (TUH) Seizure Corpus, six seizure types are compared; absence, complex partial, myoclonic, simple partial, tonic and tonic- clonic models. This study proposes a method that utilises unique features with a novel parallel classifier - Parallel Genetic Naive Bayes (NB) Seizure Classifier (PGNBSC). The PGNBSC algorithm searches through the features and by reclassifying the data each time, the algorithm will create a matrix for optimum search criteria. Ictal states from the EEGs are segmented into 1.8 s windows, where the epochs are then further decomposed into 13 different features from the first intrinsic mode function (IMF). The features are compared using an original NB classifier in the first model. This is improved upon in a second model by using a genetic algorithm (Binary Grey Wolf Optimisation, Option 1) with a NB classifier. The third model uses a combination of the simple partial and complex partial seizures to provide the highest classification accuracy for each of the six seizures amongst the three models (20%, 53%, and 85% for first, second, and third model, respectively).
    A Decision Model for Federated Learning Architecture Pattern Selection. (arXiv:2204.13291v1 [cs.LG])
    Federated learning is growing fast in both academia and industry to resolve data hungriness and privacy issues in machine learning. A federated learning system being widely distributed with different components and stakeholders requires software system design thinking. For instance, multiple patterns and tactics have been summarised by researchers that cover various aspects, from client management, training configuration, model deployment, etc. However, the multitude of patterns leaves the designers confused about when and which pattern to adopt or adapt. Therefore, in this paper, we present a set of decision models to assist designers and architects who have limited knowledge in federated learning, in selecting architectural patterns for federated learning architecture design. Each decision model maps functional and non-functional requirements of federated learning systems to a set of patterns. we also clarify the trade-offs that may be implicit in the patterns. We evaluated the decision model through a set of interviews with practitioners to assess the correctness and usefulness in guiding the architecture design process through various design decision options.
    Efficient-VDVAE: Less is more. (arXiv:2203.13751v2 [cs.LG] UPDATED)
    Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .
    Attention Mechanism in Neural Networks: Where it Comes and Where it Goes. (arXiv:2204.13154v1 [cs.LG])
    A long time ago in the machine learning literature, the idea of incorporating a mechanism inspired by the human visual system into neural networks was introduced. This idea is named the attention mechanism, and it has gone through a long development period. Today, many works have been devoted to this idea in a variety of tasks. Remarkable performance has recently been demonstrated. The goal of this paper is to provide an overview from the early work on searching for ways to implement attention idea with neural networks until the recent trends. This review emphasizes the important milestones during this progress regarding different tasks. By this way, this study aims to provide a road map for researchers to explore the current development and get inspired for novel approaches beyond the attention.
    Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center. (arXiv:2201.11727v3 [cs.DC] UPDATED)
    This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.
    DIANES: A DEI Audit Toolkit for News Sources. (arXiv:2203.11383v2 [cs.IR] UPDATED)
    Professional news media organizations have always touted the importance that they give to multiple perspectives. However, in practice the traditional approach to all-sides has favored people in the dominant culture. Hence it has come under ethical critique under the new norms of diversity, equity, and inclusion (DEI). When DEI is applied to journalism, it goes beyond conventional notions of impartiality and bias and instead democratizes the journalistic practice of sourcing -- who is quoted or interviewed, who is not, how often, from which demographic group, gender, and so forth. There is currently no real-time or on-demand tool in the hands of reporters to analyze the persons they quote. In this paper, we present DIANES, a DEI Audit Toolkit for News Sources. It consists of a natural language processing pipeline on the backend to extract quotes, speakers, titles, and organizations from news articles in real time. On the frontend, DIANES offers the WordPress plugins, a Web monitor, and a DEI annotation API service, to help news media monitor their own quoting patterns and push themselves towards DEI norms.
    Pedagogical Rule Extraction to Learn Interpretable Models - an Empirical Study. (arXiv:2112.13285v2 [cs.LG] UPDATED)
    Machine-learning models are ubiquitous. In some domains, for instance, in medicine, the models' predictions must be interpretable. Decision trees, classification rules, and subgroup discovery are three broad categories of supervised machine-learning models presenting knowledge in the form of interpretable rules. The accuracy of these models learned from small datasets is usually low. Obtaining larger datasets is often hard to impossible. Pedagogical rule extraction methods could help to learn better rules from small data by augmenting a dataset employing statistical models and using it to learn a rule-based model. However, existing evaluation of these methods is often inconclusive, and they were not compared so far. Our framework PRELIM unifies existing pedagogical rule extraction techniques. In the extensive experiments, we identified promising PRELIM configurations not studied before.
    Continual learning-based probabilistic slow feature analysis for multimode dynamic process monitoring. (arXiv:2202.11295v2 [cs.LG] UPDATED)
    In this paper, a novel multimode dynamic process monitoring approach is proposed by extending elastic weight consolidation (EWC) to probabilistic slow feature analysis (PSFA) in order to extract multimode slow features for online monitoring. EWC was originally introduced in the setting of machine learning of sequential multi-tasks with the aim of avoiding catastrophic forgetting issue, which equally poses as a major challenge in multimode dynamic process monitoring. When a new mode arrives, a set of data should be collected so that this mode can be identified by PSFA and prior knowledge. Then, a regularization term is introduced to prevent new data from significantly interfering with the learned knowledge, where the parameter importance measures are estimated. The proposed method is denoted as PSFA-EWC, which is updated continually and capable of achieving excellent performance for successive modes. Different from traditional multimode monitoring algorithms, PSFA-EWC furnishes backward and forward transfer ability. The significant features of previous modes are retained while consolidating new information, which may contribute to learning new relevant modes. Compared with several known methods, the effectiveness of the proposed method is demonstrated via a continuous stirred tank heater and a practical coal pulverizing system.
    Low-Rank Approximation with $1/\epsilon^{1/3}$ Matrix-Vector Products. (arXiv:2202.05120v2 [cs.DS] UPDATED)
    We study iterative methods based on Krylov subspaces for low-rank approximation under any Schatten-$p$ norm. Here, given access to a matrix $A$ through matrix-vector products, an accuracy parameter $\epsilon$, and a target rank $k$, the goal is to find a rank-$k$ matrix $Z$ with orthonormal columns such that $\| A(I -ZZ^\top)\|_{S_p} \leq (1+\epsilon)\min_{U^\top U = I_k} \|A(I - U U^\top)\|_{S_p}$, where $\|M\|_{S_p}$ denotes the $\ell_p$ norm of the the singular values of $M$. For the special cases of $p=2$ (Frobenius norm) and $p = \infty$ (Spectral norm), Musco and Musco (NeurIPS 2015) obtained an algorithm based on Krylov methods that uses $\tilde{O}(k/\sqrt{\epsilon})$ matrix-vector products, improving on the na\"ive $\tilde{O}(k/\epsilon)$ dependence obtainable by the power method, where $\tilde{O}$ suppresses poly$(\log(dk/\epsilon))$ factors. Our main result is an algorithm that uses only $\tilde{O}(kp^{1/6}/\epsilon^{1/3})$ matrix-vector products, and works for all $p \geq 1$. For $p = 2$ our bound improves the previous $\tilde{O}(k/\epsilon^{1/2})$ bound to $\tilde{O}(k/\epsilon^{1/3})$. Since the Schatten-$p$ and Schatten-$\infty$ norms are the same up to a $1+ \epsilon$ factor when $p \geq (\log d)/\epsilon$, our bound recovers the result of Musco and Musco for $p = \infty$. Further, we prove a matrix-vector query lower bound of $\Omega(1/\epsilon^{1/3})$ for any fixed constant $p \geq 1$, showing that surprisingly $\tilde{\Theta}(1/\epsilon^{1/3})$ is the optimal complexity for constant~$k$. To obtain our results, we introduce several new techniques, including optimizing over multiple Krylov subspaces simultaneously, and pinching inequalities for partitioned operators. Our lower bound for $p \in [1,2]$ uses the Araki-Lieb-Thirring trace inequality, whereas for $p>2$, we appeal to a norm-compression inequality for aligned partitioned operators.
    Experimental quantum advantage with quantum coupon collector. (arXiv:2112.07884v2 [quant-ph] UPDATED)
    An increasing number of communication and computational schemes with quantum advantages have recently been proposed, which implies that quantum technology has fertile application prospects. However, demonstrating these schemes experimentally continues to be a central challenge because of the difficulty in preparing high-dimensional states or highly entangled states. In this study, we introduce and analyse a quantum coupon collector protocol by employing coherent states and simple linear optical elements, which was successfully demonstrated using realistic experimental equipment. We showed that our protocol can significantly reduce the number of samples needed to learn a specific set compared with the classical limit of the coupon collector problem. We also discuss the potential values and expansions of the quantum coupon collector by constructing a quantum blind box game. The information transmitted by the proposed game also broke the classical limit. These results strongly prove the advantages of quantum mechanics in machine learning and communication complexity.
    On Exploiting Layerwise Gradient Statistics for Effective Training of Deep Neural Networks. (arXiv:2203.13273v3 [cs.LG] UPDATED)
    Adam and AdaBelief compute and make use of elementwise adaptive stepsizes in training deep neural networks (DNNs) by tracking the exponential moving average (EMA) of the squared-gradient g_t^2 and the squared prediction error (m_t-g_t)^2, respectively, where m_t is the first momentum at iteration t and can be viewed as a prediction of g_t. In this work, we investigate if layerwise gradient statistics can be expoited in Adam and AdaBelief to allow for more effective training of DNNs. We address the above research question in two steps. Firstly, we slightly modify Adam and AdaBelief by introducing layerwise adaptive stepsizes in their update procedures via either pre- or post-processing. Our empirical results indicate that the slight modification produces comparable performance for training VGG and ResNet models over CIFAR10 and CIFAR100, suggesting that layer-wise gradient statistics play an important role towards the success of Adam and AdaBelief for at least certian DNN tasks. In the second step, we propose Aida, a new optimisation method, with the objective that the elementwise stepsizes within each layer have significantly smaller statistical variances, and the layerwise average stepsizes are much more compact across all the layers. Motivated by the fact that (m_t-g_t)^2 in AdaBelief is conservative in comparison to g_t^2 in Adam in terms of layerwise statistical averages and variances, Aida is designed by tracking a more conservative function of m_t and g_t than (m_t-g_t)^2 via layerwise vector projections. Experimental results show that Aida produces either competitive or better performance with respect to a number of existing methods including Adam and AdaBelief for a set of challenging DNN tasks.
    Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes. (arXiv:2106.15380v3 [cs.LG] UPDATED)
    In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.
    Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints. (arXiv:2010.16223v2 [cs.LG] UPDATED)
    Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.
    Efficient Geometry-aware 3D Generative Adversarial Networks. (arXiv:2112.07945v2 [cs.CV] UPDATED)
    Unsupervised generation of high-quality multi-view-consistent images and 3D shapes using only collections of single-view 2D photographs has been a long-standing challenge. Existing 3D GANs are either compute-intensive or make approximations that are not 3D-consistent; the former limits quality and resolution of the generated images and the latter adversely affects multi-view consistency and shape quality. In this work, we improve the computational efficiency and image quality of 3D GANs without overly relying on these approximations. We introduce an expressive hybrid explicit-implicit network architecture that, together with other design choices, synthesizes not only high-resolution multi-view-consistent images in real time but also produces high-quality 3D geometry. By decoupling feature generation and neural rendering, our framework is able to leverage state-of-the-art 2D CNN generators, such as StyleGAN2, and inherit their efficiency and expressiveness. We demonstrate state-of-the-art 3D-aware synthesis with FFHQ and AFHQ Cats, among other experiments.
    On Evaluation Metrics for Graph Generative Models. (arXiv:2201.09871v2 [cs.LG] UPDATED)
    In image generation, generative models can be evaluated naturally by visually inspecting model outputs. However, this is not always the case for graph generative models (GGMs), making their evaluation challenging. Currently, the standard process for evaluating GGMs suffers from three critical limitations: i) it does not produce a single score which makes model selection challenging, ii) in many cases it fails to consider underlying edge and node features, and iii) it is prohibitively slow to perform. In this work, we mitigate these issues by searching for scalar, domain-agnostic, and scalable metrics for evaluating and ranking GGMs. To this end, we study existing GGM metrics and neural-network-based metrics emerging from generative models of images that use embeddings extracted from a task-specific network. Motivated by the power of certain Graph Neural Networks (GNNs) to extract meaningful graph representations without any training, we introduce several metrics based on the features extracted by an untrained random GNN. We design experiments to thoroughly test metrics on their ability to measure the diversity and fidelity of generated graphs, as well as their sample and computational efficiency. Depending on the quantity of samples, we recommend one of two random-GNN-based metrics that we show to be more expressive than pre-existing metrics. While we focus on applying these metrics to GGM evaluation, in practice this enables the ability to easily compute the dissimilarity between any two sets of graphs regardless of domain. Our code is released at: https://github.com/uoguelph-mlrg/GGM-metrics.
    Trivial or impossible -- dichotomous data difficulty masks model differences (on ImageNet and beyond). (arXiv:2110.05922v3 [cs.CV] UPDATED)
    "The power of a generalization system follows directly from its biases" (Mitchell 1980). Today, CNNs are incredibly powerful generalisation systems -- but to what degree have we understood how their inductive bias influences model decisions? We here attempt to disentangle the various aspects that determine how a model decides. In particular, we ask: what makes one model decide differently from another? In a meticulously controlled setting, we find that (1.) irrespective of the network architecture or objective (e.g. self-supervised, semi-supervised, vision transformers, recurrent models) all models end up with a similar decision boundary. (2.) To understand these findings, we analysed model decisions on the ImageNet validation set from epoch to epoch and image by image. We find that the ImageNet validation set, among others, suffers from dichotomous data difficulty (DDD): For the range of investigated models and their accuracies, it is dominated by 46.0% "trivial" and 11.5% "impossible" images (beyond label errors). Only 42.5% of the images could possibly be responsible for the differences between two models' decision boundaries. (3.) Only removing the "impossible" and "trivial" images allows us to see pronounced differences between models. (4.) Humans are highly accurate at predicting which images are "trivial" and "impossible" for CNNs (81.4%). This implies that in future comparisons of brains, machines and behaviour, much may be gained from investigating the decisive role of images and the distribution of their difficulties.
    Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models. (arXiv:2108.12657v2 [cs.LG] UPDATED)
    Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v4 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.
    On Expressivity and Trainability of Quadratic Networks. (arXiv:2110.06081v2 [cs.LG] UPDATED)
    Inspired by the diversity of biological neurons, quadratic artificial neurons can play an important role in deep learning models. The type of quadratic neurons of our interest replaces the inner-product operation in the conventional neuron with a quadratic function. Despite promising results so far achieved by networks of quadratic neurons, there are important issues not well addressed. Theoretically, the superior expressivity of a quadratic network over either a conventional network or a conventional network via quadratic activation is not fully elucidated, which makes the use of quadratic networks not well grounded. Practically, although a quadratic network can be trained via generic backpropagation, it can be subject to a higher risk of collapse than the conventional counterpart. To address these issues, we first apply the spline theory and a measure from algebraic geometry to give two theorems that demonstrate better model expressivity of a quadratic network than the conventional counterpart with or without quadratic activation. Then, we propose an effective and efficient training strategy referred to as ReLinear to stabilize the training process of a quadratic network, thereby unleashing the full potential in its associated machine learning tasks. Comprehensive experiments on popular datasets are performed to support our findings and evaluate the performance of quadratic deep learning.
    Artificial Text Detection via Examining the Topology of Attention Maps. (arXiv:2109.04825v2 [cs.CL] UPDATED)
    The impressive capabilities of recent generative models to create texts that are challenging to distinguish from the human-written ones can be misused for generating fake news, product reviews, and even abusive content. Despite the prominent performance of existing methods for artificial text detection, they still lack interpretability and robustness towards unseen models. To this end, we propose three novel types of interpretable topological features for this task based on Topological Data Analysis (TDA) which is currently understudied in the field of NLP. We empirically show that the features derived from the BERT model outperform count- and neural-based baselines up to 10\% on three common datasets, and tend to be the most robust towards unseen GPT-style generation models as opposed to existing methods. The probing analysis of the features reveals their sensitivity to the surface and syntactic properties. The results demonstrate that TDA is a promising line with respect to NLP tasks, specifically the ones that incorporate surface and structural information.
    SCRDet++: Detecting Small, Cluttered and Rotated Objects via Instance-Level Feature Denoising and Rotation Loss Smoothing. (arXiv:2004.13316v2 [cs.CV] UPDATED)
    Small and cluttered objects are common in real-world which are challenging for detection. The difficulty is further pronounced when the objects are rotated, as traditional detectors often routinely locate the objects in horizontal bounding box such that the region of interest is contaminated with background or nearby interleaved objects. In this paper, we first innovatively introduce the idea of denoising to object detection. Instance-level denoising on the feature map is performed to enhance the detection to small and cluttered objects. To handle the rotation variation, we also add a novel IoU constant factor to the smooth L1 loss to address the long standing boundary problem, which to our analysis, is mainly caused by the periodicity of angular (PoA) and exchangeability of edges (EoE). By combing these two features, our proposed detector is termed as SCRDet++. Extensive experiments are performed on large aerial images public datasets DOTA, DIOR, UCAS-AOD as well as natural image dataset COCO, scene text dataset ICDAR2015, small traffic light dataset BSTLD and our released S$^2$TLD by this paper. The results show the effectiveness of our approach. The released dataset S2TLD is made public available, which contains 5,786 images with 14,130 traffic light instances across five categories.
    Policy Gradient Stock GAN for Realistic Discrete Order Data Generation in Financial Markets. (arXiv:2204.13338v1 [cs.LG])
    This study proposes a new generative adversarial network (GAN) for generating realistic orders in financial markets. In some previous works, GANs for financial markets generated fake orders in continuous spaces because of GAN architectures' learning limitations. However, in reality, the orders are discrete, such as order prices, which has minimum order price unit, or order types. Thus, we change the generation method to place the generated fake orders into discrete spaces in this study. Because this change disabled the ordinary GAN learning algorithm, this study employed a policy gradient, frequently used in reinforcement learning, for the learning algorithm. Through our experiments, we show that our proposed model outperforms previous models in generated order distribution. As an additional benefit of introducing the policy gradient, the entropy of the generated policy can be used to check GAN's learning status. In the future, higher performance GANs, better evaluation methods, or the applications of our GANs can be addressed.
    Machine learning on DNA-encoded library count data using an uncertainty-aware probabilistic loss function. (arXiv:2108.12471v2 [q-bio.QM] UPDATED)
    DNA-encoded library (DEL) screening and quantitative structure-activity relationship (QSAR) modeling are two techniques used in drug discovery to find small molecules that bind a protein target. Applying QSAR modeling to DEL data can facilitate the selection of compounds for off-DNA synthesis and evaluation. Such a combined approach has been shown recently by training binary classifiers to learn DEL enrichments of aggregated "disynthons" to accommodate the sparse and noisy nature of DEL data. However, a binary classifier cannot distinguish between different levels of enrichment, and information is potentially lost during disynthon aggregation. Here, we demonstrate a regression approach to learning DEL enrichments of individual molecules using a custom negative log-likelihood loss function that effectively denoises DEL data and introduces opportunities for visualization of learned structure-activity relationships (SAR). Our approach explicitly models the Poisson statistics of the sequencing process used in the DEL experimental workflow under a frequentist view. We illustrate this approach on a dataset of 108k compounds screened against CAIX, and a dataset of 5.7M compounds screened against sEH and SIRT2. Due to the treatment of uncertainty in the data through the negative log-likelihood loss function, the models can ignore low-confidence outliers. While our approach does not demonstrate a benefit for extrapolation to novel structures, we expect our denoising and visualization pipeline to be useful in identifying SAR trends and enriched pharmacophores in DEL data. Further, this approach to uncertainty-aware regression is applicable to other sparse or noisy datasets where the nature of stochasticity is known or can be modeled; in particular, the Poisson enrichment ratio metric we use can apply to other settings that compare sequencing count data between two experimental conditions.
    Predicting S&P500 Index direction with Transfer Learning and a Causal Graph as main Input. (arXiv:2011.13113v3 [q-fin.ST] UPDATED)
    We propose a unified multi-tasking framework to represent the complex and uncertain causal process of financial market dynamics, and then to predict the movement of any type of index with an application on the monthly direction of the S&P500 index. our solution is based on three main pillars: (i) the use of transfer learning to share knowledge and feature (representation, learning) between all financial markets, increase the size of the training sample and preserve the stability between training, validation and test sample. (ii) The combination of multidisciplinary knowledge (Financial economics, behavioral finance, market microstructure and portfolio construction theories) to represent a global top-down dynamics of any financial market, through a graph. (iii) The integration of forward looking unstructured data, different types of contexts (long, medium and short term) through latent variables/nodes and then, use a unique VAE network (parameter sharing) to learn simultaneously their distributional representation. We obtain Accuracy, F1-score, and Matthew Correlation of 74.3 %, 67 % and 0.42 above the industry and other benchmark on 12 years test period which include three unstable and difficult sub-period to predict.
    Bilinear value networks. (arXiv:2204.13695v1 [cs.AI])
    The dominant framework for off-policy multi-goal reinforcement learning involves estimating goal conditioned Q-value function. When learning to achieve multiple goals, data efficiency is intimately connected with the generalization of the Q-function to new goals. The de-facto paradigm is to approximate Q(s, a, g) using monolithic neural networks. To improve the generalization of the Q-function, we propose a bilinear decomposition that represents the Q-value via a low-rank approximation in the form of a dot product between two vector fields. The first vector field, f(s, a), captures the environment's local dynamics at the state s; whereas the second component, {\phi}(s, g), captures the global relationship between the current state and the goal. We show that our bilinear decomposition scheme substantially improves data efficiency, and has superior transfer to out-of-distribution goals compared to prior methods. Empirical evidence is provided on the simulated Fetch robot task-suite and dexterous manipulation with a Shadow hand.
    AlphaZero-Inspired General Board Game Learning and Playing. (arXiv:2204.13307v1 [cs.LG])
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).
    Exchangeability-Aware Sum-Product Networks. (arXiv:2110.05165v2 [cs.LG] UPDATED)
    Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.
    Supervised machine learning classification for short straddles on the S&P500. (arXiv:2204.13587v1 [q-fin.CP])
    In this working paper we present our current progress in the training of machine learning models to execute short option strategies on the S&P500. As a first step, this paper is breaking this problem down to a supervised classification task to decide if a short straddle on the S&P500 should be executed or not on a daily basis. We describe our used framework and present an overview over our evaluation metrics on different classification models. In this preliminary work, using standard machine learning techniques and without hyperparameter search, we find no statistically significant outperformance to a simple "trade always" strategy, but gain additional insights on how we could proceed in further experiments.
    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension. (arXiv:1905.10395v5 [cs.LG] UPDATED)
    We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines. This work is a gentle extension of [2].
    Overcoming Catastrophic Forgetting via Direction-Constrained Optimization. (arXiv:2011.12581v2 [cs.LG] UPDATED)
    This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.
    Evaluating Contextual Embeddings and their Extraction Layers for Depression Assessment. (arXiv:2112.13795v2 [cs.CL] UPDATED)
    Recent works have demonstrated ability to assess aspects of mental health from personal discourse. At the same time, pre-trained contextual word embedding models have grown to dominate much of NLP but little is known empirically on how to best apply them for mental health assessment. Using degree of depression as a case study, we do an empirical analysis on which off-the-shelf language model, individual layers, and combinations of layers seem most promising when applied to human-level NLP tasks. Notably, we find RoBERTa most effective and, despite the standard in past work suggesting the second-to-last or concatenation of the last 4 layers, we find layer 19 (sixth-to last) is at least as good as layer 23 when using 1 layer. Further, when using multiple layers, distributing them across the second half (i.e. Layers 12+), rather than last 4, of the 24 layers yielded the most accurate results.
    An Adversarial Attack Analysis on Malicious Advertisement URL Detection Framework. (arXiv:2204.13172v1 [cs.LG])
    Malicious advertisement URLs pose a security risk since they are the source of cyber-attacks, and the need to address this issue is growing in both industry and academia. Generally, the attacker delivers an attack vector to the user by means of an email, an advertisement link or any other means of communication and directs them to a malicious website to steal sensitive information and to defraud them. Existing malicious URL detection techniques are limited and to handle unseen features as well as generalize to test data. In this study, we extract a novel set of lexical and web-scrapped features and employ machine learning technique to set up system for fraudulent advertisement URLs detection. The combination set of six different kinds of features precisely overcome the obfuscation in fraudulent URL classification. Based on different statistical properties, we use twelve different formatted datasets for detection, prediction and classification task. We extend our prediction analysis for mismatched and unlabelled datasets. For this framework, we analyze the performance of four machine learning techniques: Random Forest, Gradient Boost, XGBoost and AdaBoost in the detection part. With our proposed method, we can achieve a false negative rate as low as 0.0037 while maintaining high accuracy of 99.63%. Moreover, we devise a novel unsupervised technique for data clustering using K- Means algorithm for the visual analysis. This paper analyses the vulnerability of decision tree-based models using the limited knowledge attack scenario. We considered the exploratory attack and implemented Zeroth Order Optimization adversarial attack on the detection models.
    Medical Image Segmentation with 3D Convolutional Neural Networks: A Survey. (arXiv:2108.08467v3 [eess.IV] UPDATED)
    Computer-aided medical image analysis plays a significant role in assisting medical practitioners for expert clinical diagnosis and deciding the optimal treatment plan. At present, convolutional neural networks (CNN) are the preferred choice for medical image analysis. In addition, with the rapid advancements in three-dimensional (3D) imaging systems and the availability of excellent hardware and software support to process large volumes of data, 3D deep learning methods are gaining popularity in medical image analysis. Here, we present an extensive review of the recently evolved 3D deep learning methods in medical image segmentation. Furthermore, the research gaps and future directions in 3D medical image segmentation are discussed.
    Improving the Robustness of Federated Learning for Severely Imbalanced Datasets. (arXiv:2204.13414v1 [cs.LG])
    With the ever increasing data deluge and the success of deep neural networks, the research of distributed deep learning has become pronounced. Two common approaches to achieve this distributed learning is synchronous and asynchronous weight update. In this manuscript, we have explored very simplistic synchronous weight update mechanisms. It has been seen that with an increasing number of worker nodes, the performance degrades drastically. This effect has been studied in the context of extreme imbalanced classification (e.g. outlier detection). In practical cases, the assumed conditions of i.i.d. may not be fulfilled. There may also arise global class imbalance situations like that of outlier detection where the local servers receive severely imbalanced data and may not get any samples from the minority class. In that case, the DNNs in the local servers will get completely biased towards the majority class that they receive. This would highly impact the learning at the parameter server (which practically does not see any data). It has been observed that in a parallel setting if one uses the existing federated weight update mechanisms at the parameter server, the performance degrades drastically with the increasing number of worker nodes. This is mainly because, with the increasing number of nodes, there is a high chance that one worker node gets a very small portion of the data, either not enough to train the model without overfitting or having a highly imbalanced class distribution. The chapter, hence, proposes a workaround to this problem by introducing the concept of adaptive cost-sensitive momentum averaging. It is seen that for the proposed system, there was no to minimal degradation in performance while most of the other methods hit their bottom performance before that.
    Policy Gradient Approach to Compilation of Variational Quantum Circuits. (arXiv:2111.10227v2 [quant-ph] UPDATED)
    We propose a method for finding approximate compilations of quantum unitary transformations, based on techniques from policy gradient reinforcement learning. The choice of a stochastic policy allows us to rephrase the optimization problem in terms of probability distributions, rather than variational gates. In this framework, finding the optimal configuration is done by optimizing over distribution parameters, rather than over free angles. We show numerically that this approach can be more competitive than gradient-free methods, for comparable amounts of resources (i.e. quantum circuit runs). Another interesting feature of this approach to variational compilation is that it does not need a separate register and long-range interactions to estimate the end-point fidelity, which is an improvement over methods which rely on the Hilbert-Schmidt test. We expect these techniques to be relevant for training variational circuits in other contexts.
    Vitruvion: A Generative Model of Parametric CAD Sketches. (arXiv:2109.14124v2 [cs.LG] UPDATED)
    Parametric computer-aided design (CAD) tools are the predominant way that engineers specify physical structures, from bicycle pedals to airplanes to printed circuit boards. The key characteristic of parametric CAD is that design intent is encoded not only via geometric primitives, but also by parameterized constraints between the elements. This relational specification can be viewed as the construction of a constraint program, allowing edits to coherently propagate to other parts of the design. Machine learning offers the intriguing possibility of accelerating the design process via generative modeling of these structures, enabling new tools such as autocompletion, constraint inference, and conditional synthesis. In this work, we present such an approach to generative modeling of parametric CAD sketches, which constitute the basic computational building blocks of modern mechanical design. Our model, trained on real-world designs from the SketchGraphs dataset, autoregressively synthesizes sketches as sequences of primitives, with initial coordinates, and constraints that reference back to the sampled primitives. As samples from the model match the constraint graph representation used in standard CAD software, they may be directly imported, solved, and edited according to downstream design tasks. In addition, we condition the model on various contexts, including partial sketches (primers) and images of hand-drawn sketches. Evaluation of the proposed approach demonstrates its ability to synthesize realistic CAD sketches and its potential to aid the mechanical design workflow.
    Optimal Transport for Unsupervised Denoising Learning. (arXiv:2108.02574v4 [eess.IV] UPDATED)
    Recently, much progress has been made in unsupervised denoising learning. However, existing methods more or less rely on some assumptions on the signal and/or degradation model, which limits their practical performance. How to construct an optimal criterion for unsupervised denoising learning without any prior knowledge on the degradation model is still an open question. Toward answering this question, this work proposes a criterion for unsupervised denoising learning based on the optimal transport theory. This criterion has favorable properties, e.g., approximately maximal preservation of the information of the signal, whilst achieving perceptual reconstruction. Furthermore, though a relaxed unconstrained formulation is used in practical implementation, we prove that the relaxed formulation in theory has the same solution as the original constrained formulation. Experiments on synthetic and real-world data, including realistic photographic, microscopy, depth, and raw depth images, demonstrate that the proposed method even compares favorably with supervised methods, e.g., approaching the PSNR of supervised methods while having better perceptual quality. Particularly, for spatially correlated noise and realistic microscopy images, the proposed method not only achieves better perceptual quality but also has higher PSNR than supervised methods. Besides, it shows remarkable superiority in harsh practical conditions with complex noise, e.g., raw depth images. Code is available at https://github.com/wangweiSJTU/OTUR.
    Deepfake Forensics via An Adversarial Game. (arXiv:2103.13567v2 [cs.CV] UPDATED)
    With the progress in AI-based facial forgery (i.e., deepfake), people are increasingly concerned about its abuse. Albeit effort has been made for training classification (also known as deepfake detection) models to recognize such forgeries, existing models suffer from poor generalization to unseen forgery technologies and high sensitivity to changes in image/video quality. In this paper, we advocate adversarial training for improving the generalization ability to both unseen facial forgeries and unseen image/video qualities. We believe training with samples that are adversarially crafted to attack the classification models improves the generalization ability considerably. Considering that AI-based face manipulation often leads to high-frequency artifacts that can be easily spotted by models yet difficult to generalize, we further propose a new adversarial training method that attempts to blur out these specific artifacts, by introducing pixel-wise Gaussian blurring models. With adversarial training, the classification models are forced to learn more discriminative and generalizable features, and the effectiveness of our method can be verified by plenty of empirical evidence. Our code will be made publicly available.
    Multi Type Mean Field Reinforcement Learning. (arXiv:2002.02513v6 [cs.MA] UPDATED)
    Mean field theory provides an effective way of scaling multiagent reinforcement learning algorithms to environments with many agents that can be abstracted by a virtual mean agent. In this paper, we extend mean field multiagent algorithms to multiple types. The types enable the relaxation of a core assumption in mean field reinforcement learning, which is that all agents in the environment are playing almost similar strategies and have the same goal. We conduct experiments on three different testbeds for the field of many agent reinforcement learning, based on the standard MAgents framework. We consider two different kinds of mean field environments: a) Games where agents belong to predefined types that are known a priori and b) Games where the type of each agent is unknown and therefore must be learned based on observations. We introduce new algorithms for each type of game and demonstrate their superior performance over state of the art algorithms that assume that all agents belong to the same type and other baseline algorithms in the MAgent framework.
    KING: Generating Safety-Critical Driving Scenarios for Robust Imitation via Kinematics Gradients. (arXiv:2204.13683v1 [cs.RO])
    Simulators offer the possibility of safe, low-cost development of self-driving systems. However, current driving simulators exhibit na\"ive behavior models for background traffic. Hand-tuned scenarios are typically added during simulation to induce safety-critical situations. An alternative approach is to adversarially perturb the background traffic trajectories. In this paper, we study this approach to safety-critical driving scenario generation using the CARLA simulator. We use a kinematic bicycle model as a proxy to the simulator's true dynamics and observe that gradients through this proxy model are sufficient for optimizing the background traffic trajectories. Based on this finding, we propose KING, which generates safety-critical driving scenarios with a 20% higher success rate than black-box optimization. By solving the scenarios generated by KING using a privileged rule-based expert algorithm, we obtain training data for an imitation learning policy. After fine-tuning on this new data, we show that the policy becomes better at avoiding collisions. Importantly, our generated data leads to reduced collisions on both held-out scenarios generated via KING as well as traditional hand-crafted scenarios, demonstrating improved robustness.
    Fuzzy Cognitive Maps and Hidden Markov Models: Comparative Analysis of Efficiency within the Confines of the Time Series Classification Task. (arXiv:2204.13455v1 [cs.LG])
    Time series classification is one of the very popular machine learning tasks. In this paper, we explore the application of Hidden Markov Model (HMM) for time series classification. We distinguish between two modes of HMM application. The first, in which a single model is built for each class. The second, in which one HMM is built for each time series. We then transfer both approaches for classifier construction to the domain of Fuzzy Cognitive Maps. The identified four models, HMM NN (HMM, one per series), HMM 1C (HMM, one per class), FCM NN, and FCM 1C are then studied in a series of experiments. We compare the performance of different models and investigate the impact of their hyperparameters on the time series classification accuracy. The empirical evaluation shows a clear advantage of the one-model-per-series approach. The results show that the choice between HMM and FCM should be dataset-dependent.
    Autoencoder based Hybrid Multi-Task Predictor Network for Daily Open-High-Low-Close Prices Prediction of Indian Stocks. (arXiv:2204.13422v1 [cs.LG])
    Stock prices are highly volatile and sudden changes in trends are often very problematic for traditional forecasting models to handle. The standard Long Short Term Memory (LSTM) networks are regarded as the state-of-the-art models for such predictions. But, these models fail to handle sudden and drastic changes in the price trend. Moreover, there are some inherent constraints with the open, high, low and close (OHLC) prices of the stocks. Literature lacks the study on the inherent property of OHLC prices. We argue that predicting the OHLC prices for the next day is much more informative than predicting the trends of the stocks as the trend is mostly calculated using these OHLC prices only. The problem mainly is focused on Buy-Today Sell-Tomorrow (BTST) trading. In this regard, AEs when pre-trained with the stock prices, may be beneficial. A novel framework is proposed where a pre-trained encoder is cascaded in front of the multi-task predictor network. This hybrid network can leverage the power of a combination of networks and can both handle the OHLC constraints as well as capture any sudden drastic changes in the prices. It is seen that such a network is much more efficient at predicting stock prices. The experiments have been extended to recommend the most profitable and most overbought stocks on the next day. The model has been tested for multiple Indian companies and it is found that the recommendations from the proposed model have not resulted in a single loss for a test period of 300 days.
    Cumulative Stay-time Representation for Electronic Health Records in Medical Event Time Prediction. (arXiv:2204.13451v1 [cs.LG])
    We address the problem of predicting when a disease will develop, i.e., medical event time (MET), from a patient's electronic health record (EHR). The MET of non-communicable diseases like diabetes is highly correlated to cumulative health conditions, more specifically, how much time the patient spent with specific health conditions in the past. The common time-series representation is indirect in extracting such information from EHR because it focuses on detailed dependencies between values in successive observations, not cumulative information. We propose a novel data representation for EHR called cumulative stay-time representation (CTR), which directly models such cumulative health conditions. We derive a trainable construction of CTR based on neural networks that has the flexibility to fit the target data and scalability to handle high-dimensional EHR. Numerical experiments using synthetic and real-world datasets demonstrate that CTR alone achieves a high prediction performance, and it enhances the performance of existing models when combined with them.
    Reusability and Transferability of Macro Actions for Reinforcement Learning. (arXiv:1908.01478v3 [cs.NE] UPDATED)
    Conventional reinforcement learning (RL) typically determines an appropriate primitive action at each timestep. However, by using a proper macro action, defined as a sequence of primitive actions, an agent is able to bypass intermediate states to a farther state and facilitate its learning procedure. The problem we would like to investigate is what associated beneficial properties that macro actions may possess. In this paper, we unveil the properties of reusability and transferability of macro actions. The first property, reusability, means that a macro action generated along with one RL method can be reused by another RL method for training, while the second one, transferability, means that a macro action can be utilized for training agents in similar environments with different reward settings. In our experiments, we first generate macro actions along with RL methods. We then provide a set of analyses to reveal the properties of reusability and transferability of the generated macro actions.
    A unified theory of information transfer and causal relation. (arXiv:2204.13598v1 [cond-mat.stat-mech])
    Information transfer between coupled stochastic dynamics, measured by transfer entropy and information flow, is suggested as a physical process underlying the causal relation of systems. While information transfer analysis has booming applications in both science and engineering fields, critical mysteries about its foundations remain unsolved. Fundamental yet difficult questions concern how information transfer and causal relation originate, what they depend on, how they differ from each other, and if they are created by a unified and general quantity. These questions essentially determine the validity of causal relation measurement via information transfer. Here we pursue to lay a complete theoretical basis of information transfer and causal relation. Beyond the well-known relations between these concepts that conditionally hold, we demonstrate that information transfer and causal relation universally originate from specific information synergy and redundancy phenomena characterized by high-order mutual information. More importantly, our theory analytically explains the mechanisms for information transfer and causal relation to originate, vanish, and differ from each other. Moreover, our theory naturally defines the effect sizes of information transfer and causal relation based on high-dimensional coupling events. These results may provide a unified view of information, synergy, and causal relation to bridge Pearl's causal inference theory in computer science and information transfer analysis in physics.
    Unaligned Supervision For Automatic Music Transcription in The Wild. (arXiv:2204.13668v1 [cs.SD])
    Multi-instrument Automatic Music Transcription (AMT), or the decoding of a musical recording into semantic musical content, is one of the holy grails of Music Information Retrieval. Current AMT approaches are restricted to piano and (some) guitar recordings, due to difficult data collection. In order to overcome data collection barriers, previous AMT approaches attempt to employ musical scores in the form of a digitized version of the same song or piece. The scores are typically aligned using audio features and strenuous human intervention to generate training labels. We introduce NoteEM, a method for simultaneously training a transcriber and aligning the scores to their corresponding performances, in a fully-automated process. Using this unaligned supervision scheme, complemented by pseudo-labels and pitch-shift augmentation, our method enables training on in-the-wild recordings with unprecedented accuracy and instrumental variety. Using only synthetic data and unaligned supervision, we report SOTA note-level accuracy of the MAPS dataset, and large favorable margins on cross-dataset evaluations. We also demonstrate robustness and ease of use; we report comparable results when training on a small, easily obtainable, self-collected dataset, and we propose alternative labeling to the MusicNet dataset, which we show to be more accurate. Our project page is available at https://benadar293.github.io
    Process-BERT: A Framework for Representation Learning on Educational Process Data. (arXiv:2204.13607v1 [cs.LG])
    Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. However, analyzing process data is challenging since the specific format of process data varies a lot depending on different learning/testing scenarios. In this paper, we propose a framework for learning representations of educational process data that is applicable across many different learning scenarios. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation's report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.
    Unlocking High-Accuracy Differentially Private Image Classification through Scale. (arXiv:2204.13650v1 [cs.LG])
    Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.
    Bona fide Riesz projections for density estimation. (arXiv:2204.13606v1 [eess.SP])
    The projection of sample measurements onto a reconstruction space represented by a basis on a regular grid is a powerful and simple approach to estimate a probability density function. In this paper, we focus on Riesz bases and propose a projection operator that, in contrast to previous works, guarantees the bona fide properties for the estimate, namely, non-negativity and total probability mass $1$. Our bona fide projection is defined as a convex problem. We propose solution techniques and evaluate them. Results suggest an improved performance, specifically in circumstances prone to rippling effects.
    Foundations for learning from noisy quantum experiments. (arXiv:2204.13691v1 [quant-ph])
    Understanding what can be learned from experiments is central to scientific progress. In this work, we use a learning-theoretic perspective to study the task of learning physical operations in a quantum machine when all operations (state preparation, dynamics, and measurement) are a priori unknown. We prove that, without any prior knowledge, if one can explore the full quantum state space by composing the operations, then every operation can be learned. When one cannot explore the full state space but all operations are approximately known and noise in Clifford gates is gate-independent, we find an efficient algorithm for learning all operations up to a single unlearnable parameter characterizing the fidelity of the initial state. For learning a noise channel on Clifford gates to a fixed accuracy, our algorithm uses quadratically fewer experiments than previously known protocols. Under more general conditions, the true description of the noise can be unlearnable; for example, we prove that no benchmarking protocol can learn gate-dependent Pauli noise on Clifford+T gates even under perfect state preparation and measurement. Despite not being able to learn the noise, we show that a noisy quantum computer that performs entangled measurements on multiple copies of an unknown state can yield a large advantage in learning properties of the state compared to a noiseless device that measures individual copies and then processes the measurement data using a classical computer. Concretely, we prove that noisy quantum computers with two-qubit gate error rate $\epsilon$ can achieve a learning task using $N$ copies of the state, while $N^{\Omega(1/\epsilon)}$ copies are required classically.
    Phase Shift Design in RIS Empowered Wireless Networks: From Optimization to AI-Based Methods. (arXiv:2204.13372v1 [cs.LG])
    Reconfigurable intelligent surfaces (RISs) have a revolutionary capability to customize the radio propagation environment for wireless networks. To fully exploit the advantages of RISs in wireless systems, the phases of the reflecting elements must be jointly designed with conventional communication resources, such as beamformers, transmit power, and computation time. However, due to the unique constraints on the phase shift, and massive numbers of reflecting units and users in large-scale networks, the resulting optimization problems are challenging to solve. This paper provides a review of current optimization methods and artificial intelligence-based methods for handling the constraints imposed by RIS and compares them in terms of solution quality and computational complexity. Future challenges in phase shift optimization involving RISs are also described and potential solutions are discussed.
    It's DONE: Direct ONE-shot learning without training optimization. (arXiv:2204.13361v1 [cs.LG])
    Learning a new concept from one example is a superior function of human brain and it is drawing attention in the field of machine learning as one-shot learning task. In this paper, we propose the simplest method for this task, named Direct ONE-shot learning (DONE). DONE adds a new class to a pretrained deep neural network (DNN) classifier with neither training optimization nor other-classes modification. DONE is inspired by Hebbian theory and directly uses the neural activity input of the final dense layer obtained from a data that belongs to the new additional class as the connectivity weight (synaptic strength) with a newly-provided-output neuron for the new class. DONE requires just one inference for obtaining the output of the final dense layer and its procedure is simple, deterministic, not requiring parameter tuning and hyperparameters. The performance of DONE depends entirely on the pretrained DNN model used as a backbone model, and we confirmed that DONE with a well-trained backbone model performs a practical-level accuracy. DONE has some advantages including a DNN's practical use that is difficult to spend high cost for a training, an evaluation of existing DNN models, and the understanding of the brain. DONE might be telling us one-shot learning is an easy task that can be achieved by a simple principle not only for humans but also for current well-trained DNN models.
    List-Mode PET Image Reconstruction Using Deep Image Prior. (arXiv:2204.13404v1 [physics.med-ph])
    List-mode positron emission tomography (PET) image reconstruction is an important tool for PET scanners with many lines-of-response (LORs) and additional information such as time-of-flight and depth-of-interaction. Deep learning is one possible solution to enhance the quality of PET image reconstruction. However, the application of deep learning techniques to list-mode PET image reconstruction have not been progressed because list data is a sequence of bit codes and unsuitable for processing by convolutional neural networks (CNN). In this study, we propose a novel list-mode PET image reconstruction method using an unsupervised CNN called deep image prior (DIP) and a framework of alternating direction method of multipliers. The proposed list-mode DIP reconstruction (LM-DIPRecon) method alternatively iterates regularized list-mode dynamic row action maximum likelihood algorithm (LM-DRAMA) and magnetic resonance imaging conditioned DIP (MR-DIP). We evaluated LM-DIPRecon using both simulation and clinical data, and it achieved sharper images and better tradeoff curves between contrast and noise than the LM-DRAMA and MR-DIP. These results indicated that the LM-DIPRecon is useful for quantitative PET imaging with limited events. In addition, as list data has finer temporal information than dynamic sinograms, list-mode deep image prior reconstruction is expected to be useful for 4D PET imaging and motion correction.
    Poisoning Deep Learning based Recommender Model in Federated Learning Scenarios. (arXiv:2204.13594v1 [cs.IR])
    Various attack methods against recommender systems have been proposed in the past years, and the security issues of recommender systems have drawn considerable attention. Traditional attacks attempt to make target items recommended to as many users as possible by poisoning the training data. Benifiting from the feature of protecting users' private data, federated recommendation can effectively defend such attacks. Therefore, quite a few works have devoted themselves to developing federated recommender systems. For proving current federated recommendation is still vulnerable, in this work we probe to design attack approaches targeting deep learning based recommender models in federated learning scenarios. Specifically, our attacks generate poisoned gradients for manipulated malicious users to upload based on two strategies (i.e., random approximation and hard user mining). Extensive experiments show that our well-designed attacks can effectively poison the target models, and the attack effectiveness sets the state-of-the-art.
    Model Selection, Adaptation, and Combination for Deep Transfer Learning through Neural Networks in Renewable Energies. (arXiv:2204.13293v1 [cs.LG])
    There is recent interest in using model hubs, a collection of pre-trained models, in computer vision tasks. To utilize the model hub, we first select a source model and then adapt the model for the target to compensate for differences. While there is yet limited research on a model selection and adaption for computer vision tasks, this holds even more for the field of renewable power. At the same time, it is a crucial challenge to provide forecasts for the increasing demand for power forecasts based on weather features from a numerical weather prediction. We close these gaps by conducting the first thorough experiment for model selection and adaptation for transfer learning in renewable power forecast, adopting recent results from the field of computer vision on six datasets. We adopt models based on data from different seasons and limit the amount of training data. As an extension of the current state of the art, we utilize a Bayesian linear regression for forecasting the response based on features extracted from a neural network. This approach outperforms the baseline with only seven days of training data. We further show how combining multiple models through ensembles can significantly improve the model selection and adaptation approach. In fact, with more than 30 days of training data, both proposed model combination techniques achieve similar results to those models trained with a full year of training data.
    On Parametric Optimal Execution and Machine Learning Surrogates. (arXiv:2204.08581v2 [q-fin.TR] UPDATED)
    We investigate optimal order execution problems in discrete time with instantaneous price impact and stochastic resilience. First, in the setting of linear transient price impact we derive a closed-form recursion for the optimal strategy, extending the deterministic results from Obizhaeva and Wang (J Financial Markets, 2013). Second, we develop a numerical algorithm based on dynamic programming and deep learning for the case of nonlinear transient price impact as proposed by Bouchaud et al. (Quant. Finance, 2004). Specifically, we utilize an actor-critic framework that constructs two neural-network (NN) surrogates for the value function and the feedback control. The flexible scalability of NN functional approximators enables parametric learning, i.e., incorporating several model or market parameters as part of the input space. Precise calibration of price impact, resilience, etc., is known to be extremely challenging and hence it is critical to understand sensitivity of the execution policy to these parameters. Our NN learner organically scales across multiple input dimensions and is shown to accurately approximate optimal strategies across a wide range of parameter configurations. We provide a fully reproducible Jupyter Notebook with our NN implementation, which is of independent pedagogical interest, demonstrating the ease of use of NN surrogates in (parametric) stochastic control problems.
    Predicting Sleeping Quality using Convolutional Neural Networks. (arXiv:2204.13584v1 [eess.SP])
    Identifying sleep stages and patterns is an essential part of diagnosing and treating sleep disorders. With the advancement of smart technologies, sensor data related to sleeping patterns can be captured easily. In this paper, we propose a Convolution Neural Network (CNN) architecture that improves the classification performance. In particular, we benchmark the classification performance from different methods, including traditional machine learning methods such as Logistic Regression (LR), Decision Trees (DT), k-Nearest Neighbour (k-NN), Naive Bayes (NB) and Support Vector Machine (SVM), on 3 publicly available sleep datasets. The accuracy, sensitivity, specificity, precision, recall, and F-score are reported and will serve as a baseline to simulate the research in this direction in the future.
    Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss. (arXiv:2204.13437v1 [cs.SD])
    Recent deep learning Text-to-Speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of Tacotron2 which aims to alleviate the training issues and at the same time produce monotonic alignments. Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism. By properly adjusting this regularization term we show that the loss curves become smoother, and at the same time Regotron consistently produces monotonic alignments in unseen examples even at an early stage (13\% of the total number of epochs) of its training process, whereas the fully converged Tacotron2 fails to do so. Moreover, our proposed regularization method has no additional computational overhead, while reducing common TTS mistakes and achieving slighlty improved speech naturalness according to subjective mean opinion scores (MOS) collected from 50 evaluators.
    Multi-Player Multi-Armed Bandits with Finite Shareable Resources Arms: Learning Algorithms & Applications. (arXiv:2204.13502v1 [cs.LG])
    Multi-player multi-armed bandits (MMAB) study how decentralized players cooperatively play the same multi-armed bandit so as to maximize their total cumulative rewards. Existing MMAB models mostly assume when more than one player pulls the same arm, they either have a collision and obtain zero rewards, or have no collision and gain independent rewards, both of which are usually too restrictive in practical scenarios. In this paper, we propose an MMAB with shareable resources as an extension to the collision and non-collision settings. Each shareable arm has finite shareable resources and a "per-load" reward random variable, both of which are unknown to players. The reward from a shareable arm is equal to the "per-load" reward multiplied by the minimum between the number of players pulling the arm and the arm's maximal shareable resources. We consider two types of feedback: sharing demand information (SDI) and sharing demand awareness (SDA), each of which provides different signals of resource sharing. We design the DPE-SDI and SIC-SDA algorithms to address the shareable arm problem under these two cases of feedback respectively and prove that both algorithms have logarithmic regrets that are tight in the number of rounds. We conduct simulations to validate both algorithms' performance and show their utilities in wireless networking and edge computing.
    Predicting batch queue job wait times for informed scheduling of urgent HPC workloads. (arXiv:2204.13543v1 [cs.DC])
    There is increasing interest in the use of HPC machines for urgent workloads to help tackle disasters as they unfold. Whilst batch queue systems are not ideal in supporting such workloads, many disadvantages can be worked around by accurately predicting when a waiting job will start to run. However there are numerous challenges in achieving such a prediction with high accuracy, not least because the queue's state can change rapidly and depend upon many factors. In this work we explore a novel machine learning approach for predicting queue wait times, hypothesising that such a model can capture the complex behaviour resulting from the queue policy and other interactions to generate accurate job start times. For ARCHER2 (HPE Cray EX), Cirrus (HPE 8600) and 4-cabinet (HPE Cray EX) we explore how different machine learning approaches and techniques improve the accuracy of our predictions, comparing against the estimation generated by Slurm. We demonstrate that our techniques deliver the most accurate predictions across our machines of interest, with the result of this work being the ability to predict job start times within one minute of the actual start time for around 65\% of jobs on ARCHER2 and 4-cabinet, and 76\% of jobs on Cirrus. When compared against what Slurm can deliver, this represents around 3.8 times better accuracy on ARCHER2 and 18 times better for Cirrus. Furthermore our approach can accurately predicting the start time for three quarters of all job within ten minutes of the actual start time on ARCHER2 and 4-cabinet, and for 90\% of jobs on Cirrus. Whilst the driver of this work has been to better facilitate placement of urgent workloads across HPC machines, the insights gained can be used to provide wider benefits to users and also enrich existing batch queue systems and inform policy too.
    BI-GreenNet: Learning Green's functions by boundary integral network. (arXiv:2204.13247v1 [cs.LG])
    Green's function plays a significant role in both theoretical analysis and numerical computing of partial differential equations (PDEs). However, in most cases, Green's function is difficult to compute. The troubles arise in the following three folds. Firstly, compared with the original PDE, the dimension of Green's function is doubled, making it impossible to be handled by traditional mesh-based methods. Secondly, Green's function usually contains singularities which increase the difficulty to get a good approximation. Lastly, the computational domain may be very complex or even unbounded. To override these problems, we leverage the fundamental solution, boundary integral method and neural networks to develop a new method for computing Green's function with high accuracy in this paper. We focus on Green's function of Poisson and Helmholtz equations in bounded domains, unbounded domains. We also consider Poisson equation and Helmholtz domains with interfaces. Extensive numerical experiments illustrate the efficiency and the accuracy of our method for solving Green's function. In addition, we also use the Green's function calculated by our method to solve a class of PDE, and also obtain high-precision solutions, which shows the good generalization ability of our method on solving PDEs.
    Deep graph matching meets mixed-integer linear programming: Relax at your own risk ?. (arXiv:2108.00394v5 [cs.CV] UPDATED)
    Graph matching is an important problem that has received widespread attention, especially in the field of computer vision. Recently, state-of-the-art methods seek to incorporate graph matching with deep learning. However, there is no research to explain what role the graph matching algorithm plays in the model. Therefore, we propose an approach integrating a MILP formulation of the graph matching problem. This formulation is solved to optimal and it provides inherent baseline. Meanwhile, similar approaches are derived by releasing the optimal guarantee of the graph matching solver and by introducing a quality level. This quality level controls the quality of the solutions provided by the graph matching solver. In addition, several relaxations of the graph matching problem are put to the test. Our experimental evaluation gives several theoretical insights and guides the direction of deep graph matching methods.
    Predicting single-cell perturbation responses for unseen drugs. (arXiv:2204.13545v1 [cs.LG])
    Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.
    Exploring How Anomalous Model Input and Output Alerts Affect Decision-Making in Healthcare. (arXiv:2204.13194v1 [cs.HC])
    An important goal in the field of human-AI interaction is to help users more appropriately trust AI systems' decisions. A situation in which the user may particularly benefit from more appropriate trust is when the AI receives anomalous input or provides anomalous output. To the best of our knowledge, this is the first work towards understanding how anomaly alerts may contribute to appropriate trust of AI. In a formative mixed-methods study with 4 radiologists and 4 other physicians, we explore how AI alerts for anomalous input, very high and low confidence, and anomalous saliency-map explanations affect users' experience with mockups of an AI clinical decision support system (CDSS) for evaluating chest x-rays for pneumonia. We find evidence suggesting that the four anomaly alerts are desired by non-radiologists, and the high-confidence alerts are desired by both radiologists and non-radiologists. In a follow-up user study, we investigate how high- and low-confidence alerts affect the accuracy and thus appropriate trust of 33 radiologists working with AI CDSS mockups. We observe that these alerts do not improve users' accuracy or experience and discuss potential reasons why.
    Machine Learning for Violence Risk Assessment Using Dutch Clinical Notes. (arXiv:2204.13535v1 [cs.LG])
    Violence risk assessment in psychiatric institutions enables interventions to avoid violence incidents. Clinical notes written by practitioners and available in electronic health records are valuable resources capturing unique information, but are seldom used to their full potential. We explore conventional and deep machine learning methods to assess violence risk in psychiatric patients using practitioner notes. The performance of our best models is comparable to the currently used questionnaire-based method, with an area under the Receiver Operating Characteristic curve of approximately 0.8. We find that the deep-learning model BERTje performs worse than conventional machine learning methods. We also evaluate our data and our classifiers to understand the performance of our models better. This is particularly important for the applicability of evaluated classifiers to new data, and is also of great interest to practitioners, due to the increased availability of new data in electronic format.
    Learning Storm Surge with Gradient Boosting. (arXiv:2204.13168v1 [cs.CE])
    Storm surge is a major natural hazard for coastal regions, responsible both for significant property damage and loss of life. Accurate, efficient models of storm surge are needed both to assess long-term risk and to guide emergency management decisions. While high-fidelity ocean circulation models such as the ADvanced CIRCulation (ADCIRC) model can accurately predict storm surge, they are very computationally expensive. Consequently, there have been a number of efforts in recent years to develop data-driven surrogate models for storm surge. While these models can attain good accuracy and are highly efficient, they are often limited to a small geographical region and a fixed set of output locations. We develop a novel surrogate model for peak storm surge prediction based on gradient boosting. Unlike most surrogate approaches, our model is not explicitly constrained to a fixed set of output locations or specific geographical region. The model is trained with a database of 446 synthetic storms that make landfall on the Texas coast and obtains a mean absolute error of 0.25 meters. We additionally present a test of the model on Hurricanes Ike (2008) and Harvey (2017).
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v1 [cs.LG])
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.
    Learning-to-Rank at the Speed of Sampling: Plackett-Luce Gradient Estimation With Minimal Computational Complexity. (arXiv:2204.10872v2 [cs.LG] UPDATED)
    Plackett-Luce gradient estimation enables the optimization of stochastic ranking models within feasible time constraints through sampling techniques. Unfortunately, the computational complexity of existing methods does not scale well with the length of the rankings, i.e. the ranking cutoff, nor with the item collection size. In this paper, we introduce the novel PL-Rank-3 algorithm that performs unbiased gradient estimation with a computational complexity comparable to the best sorting algorithms. As a result, our novel learning-to-rank method is applicable in any scenario where standard sorting is feasible in reasonable time. Our experimental results indicate large gains in the time required for optimization, without any loss in performance. For the field, our contribution could potentially allow state-of-the-art learning-to-rank methods to be applied to much larger scales than previously feasible.
    Unsupervised Spatial-spectral Hyperspectral Image Reconstruction and Clustering with Diffusion Geometry. (arXiv:2204.13497v1 [cs.CV])
    Hyperspectral images, which store a hundred or more spectral bands of reflectance, have become an important data source in natural and social sciences. Hyperspectral images are often generated in large quantities at a relatively coarse spatial resolution. As such, unsupervised machine learning algorithms incorporating known structure in hyperspectral imagery are needed to analyze these images automatically. This work introduces the Spatial-Spectral Image Reconstruction and Clustering with Diffusion Geometry (DSIRC) algorithm for partitioning highly mixed hyperspectral images. DSIRC reduces measurement noise through a shape-adaptive reconstruction procedure. In particular, for each pixel, DSIRC locates spectrally correlated pixels within a data-adaptive spatial neighborhood and reconstructs that pixel's spectral signature using those of its neighbors. DSIRC then locates high-density, high-purity pixels far in diffusion distance (a data-dependent distance metric) from other high-density, high-purity pixels and treats these as cluster exemplars, giving each a unique label. Non-modal pixels are assigned the label of their diffusion distance-nearest neighbor of higher density and purity that is already labeled. Strong numerical results indicate that incorporating spatial information through image reconstruction substantially improves the performance of pixel-wise clustering.
    On the Normalizing Constant of the Continuous Categorical Distribution. (arXiv:2204.13290v1 [stat.ML])
    Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/.
    Machine learning for knowledge acquisition and accelerated inverse-design for non-Hermitian systems. (arXiv:2204.13376v1 [physics.optics])
    Non-Hermitian systems offer new platforms for unusual physical properties that can be flexibly manipulated by redistribution of the real and imaginary parts of refractive indices, whose presence breaks conventional wave propagation symmetries, leading to asymmetric reflection and symmetric transmission with respect to the wave propagation direction. Here, we use supervised and unsupervised learning techniques for knowledge acquisition in non-Hermitian systems which accelerate the inverse design process. In particular, we construct a deep learning model that relates the transmission and asymmetric reflection in non-conservative settings and proposes sub-manifold learning to recognize non-Hermitian features from transmission spectra. The developed deep learning framework determines the feasibility of a desired spectral response for a given structure and uncovers the role of effective gain-loss parameters to tailor the spectral response. These findings pave the way for intelligent inverse design and shape our understanding of the physical mechanism in general non-Hermitian systems.
    MetaCVR: Conversion Rate Prediction via Meta Learning in Small-Scale Recommendation Scenarios. (arXiv:2112.13753v5 [cs.LG] UPDATED)
    Different from large-scale platforms such as Taobao and Amazon, CVR modeling in small-scale recommendation scenarios is more challenging due to the severe Data Distribution Fluctuation (DDF) issue. DDF prevents existing CVR models from being effective since 1) several months of data are needed to train CVR models sufficiently in small scenarios, leading to considerable distribution discrepancy between training and online serving; and 2) e-commerce promotions have significant impacts on small scenarios, leading to distribution uncertainty of the upcoming time period. In this work, we propose a novel CVR method named MetaCVR from a perspective of meta learning to address the DDF issue. Firstly, a base CVR model which consists of a Feature Representation Network (FRN) and output layers is designed and trained sufficiently with samples across months. Then we treat time periods with different data distributions as different occasions and obtain positive and negative prototypes for each occasion using the corresponding samples and the pre-trained FRN. Subsequently, a Distance Metric Network (DMN) is devised to calculate the distance metrics between each sample and all prototypes to facilitate mitigating the distribution uncertainty. At last, we develop an Ensemble Prediction Network (EPN) which incorporates the output of FRN and DMN to make the final CVR prediction. In this stage, we freeze the FRN and train the DMN and EPN with samples from recent time period, therefore effectively easing the distribution discrepancy. To the best of our knowledge, this is the first study of CVR prediction targeting the DDF issue in small-scale recommendation scenarios. Experimental results on real-world datasets validate the superiority of our MetaCVR and online A/B test also shows our model achieves impressive gains of 11.92% on PCVR and 8.64% on GMV.
    Music Enhancement via Image Translation and Vocoding. (arXiv:2204.13289v1 [cs.SD])
    Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ. This paper presents a deep learning approach to enhance low-quality music recordings by combining (i) an image-to-image translation model for manipulating audio in its mel-spectrogram representation and (ii) a music vocoding model for mapping synthetically generated mel-spectrograms to perceptually realistic waveforms. We find that this approach to music enhancement outperforms baselines which use classical methods for mel-spectrogram inversion and an end-to-end approach directly mapping noisy waveforms to clean waveforms. Additionally, in evaluating the proposed method with a listening test, we analyze the reliability of common audio enhancement evaluation metrics when used in the music domain.
    Control-Aware Prediction Objectives for Autonomous Driving. (arXiv:2204.13319v1 [cs.LG])
    Autonomous vehicle software is typically structured as a modular pipeline of individual components (e.g., perception, prediction, and planning) to help separate concerns into interpretable sub-tasks. Even when end-to-end training is possible, each module has its own set of objectives used for safety assurance, sample efficiency, regularization, or interpretability. However, intermediate objectives do not always align with overall system performance. For example, optimizing the likelihood of a trajectory prediction module might focus more on easy-to-predict agents than safety-critical or rare behaviors (e.g., jaywalking). In this paper, we present control-aware prediction objectives (CAPOs), to evaluate the downstream effect of predictions on control without requiring the planner be differentiable. We propose two types of importance weights that weight the predictive likelihood: one using an attention model between agents, and another based on control variation when exchanging predicted trajectories for ground truth trajectories. Experimentally, we show our objectives improve overall system performance in suburban driving scenarios using the CARLA simulator.
    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning. (arXiv:2202.12275v4 [stat.ML] UPDATED)
    The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has motivated the development of federated learning algorithms, which allow multiple data owners to train collaboratively and use a shared model whilst keeping local data private. However, many of these algorithms focus on obtaining point estimates of model parameters, rather than probabilistic estimates capable of capturing model uncertainty, which is essential in many applications. Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. In this paper we introduce partitioned variational inference (PVI), a general framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners; use PVI to unify a wealth of fragmented, yet related literature; and provide empirical results that showcase the effectiveness of PVI in a variety of federated settings.
    UNBUS: Uncertainty-aware Deep Botnet Detection System in Presence of Perturbed Samples. (arXiv:2204.09502v2 [cs.CR] UPDATED)
    A rising number of botnet families have been successfully detected using deep learning architectures. While the variety of attacks increases, these architectures should become more robust against attacks. They have been proven to be very sensitive to small but well constructed perturbations in the input. Botnet detection requires extremely low false-positive rates (FPR), which are not commonly attainable in contemporary deep learning. Attackers try to increase the FPRs by making poisoned samples. The majority of recent research has focused on the use of model loss functions to build adversarial examples and robust models. In this paper, two LSTM-based classification algorithms for botnet classification with an accuracy higher than 98% are presented. Then, the adversarial attack is proposed, which reduces the accuracy to about 30%. Then, by examining the methods for computing the uncertainty, the defense method is proposed to increase the accuracy to about 70%. By using the deep ensemble and stochastic weight averaging quantification methods it has been investigated the uncertainty of the accuracy in the proposed methods.
    Actor-Critic Scheduling for Path-Aware Air-to-Ground Multipath Multimedia Delivery. (arXiv:2204.13343v1 [cs.NI])
    Reinforcement Learning (RL) has recently found wide applications in network traffic management and control because some of its variants do not require prior knowledge of network models. In this paper, we present a novel scheduler for real-time multimedia delivery in multipath systems based on an Actor-Critic (AC) RL algorithm. We focus on a challenging scenario of real-time video streaming from an Unmanned Aerial Vehicle (UAV) using multiple wireless paths. The scheduler acting as an RL agent learns in real-time the optimal policy for path selection, path rate allocation and redundancy estimation for flow protection. The scheduler, implemented as a module of the GStreamer framework, can be used in real or simulated settings. The simulation results show that our scheduler can target a very low loss rate at the receiver by dynamically adapting in real-time the scheduling policy to the path conditions without performing training or relying on prior knowledge of network channel models.
    ELM: Embedding and Logit Margins for Long-Tail Learning. (arXiv:2204.13208v1 [cs.LG])
    Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural models, such techniques do not explicitly control the geometry of the learned embeddings. This can be potentially sub-optimal, since embeddings for tail classes may be diffuse, resulting in poor generalization for these classes. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space, and regularize the distribution of embeddings. This connects losses for long-tail learning to proposals in the literature on metric embedding, and contrastive learning. We theoretically show that minimising the proposed ELM objective helps reduce the generalisation gap. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.
    Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation. (arXiv:2204.13170v1 [cs.LG])
    In Federated Learning a number of clients collaborate to train a model without sharing their data. Client models are optimized locally and are communicated through a central hub called server. A major challenge is to deal with heterogeneity among clients' data which causes the local optimization to drift away with respect to the global objective. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into Federated Learning optimization recently. However, the existing solutions propagate the error of their estimations, throughout the optimization trajectory which leads to inaccurate approximations of the clients' drift and ultimately failure to remove them properly. In this paper, we address this issue by introducing an adaptive algorithm that efficiently reduces clients' drift. Compared to the previous works on adapting variance reduction to Federated Learning, our approach uses less or the same level of communication bandwidth, computation or memory. Additionally, it addresses the instability problem--prevalent in prior work, caused by increasing norm of the estimates which makes our approach a much more practical solution for large scale Federated Learning settings. Our experimental results demonstrate that the proposed algorithm converges significantly faster and achieves higher accuracy compared to the baselines in an extensive set of Federated Learning benchmarks.
    Continual Learning with Bayesian Model based on a Fixed Pre-trained Feature Extractor. (arXiv:2204.13349v1 [cs.LG])
    Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.
    AutoLossGen: Automatic Loss Function Generation for Recommender Systems. (arXiv:2204.13160v1 [cs.IR])
    In recommendation systems, the choice of loss function is critical since a good loss may significantly improve the model performance. However, manually designing a good loss is a big challenge due to the complexity of the problem. A large fraction of previous work focuses on handcrafted loss functions, which needs significant expertise and human effort. In this paper, inspired by the recent development of automated machine learning, we propose an automatic loss function generation framework, AutoLossGen, which is able to generate loss functions directly constructed from basic mathematical operators without prior knowledge on loss structure. More specifically, we develop a controller model driven by reinforcement learning to generate loss functions, and develop iterative and alternating optimization schedule to update the parameters of both the controller model and the recommender model. One challenge for automatic loss generation in recommender systems is the extreme sparsity of recommendation datasets, which leads to the sparse reward problem for loss generation and search. To solve the problem, we further develop a reward filtering mechanism for efficient and effective loss generation. Experimental results show that our framework manages to create tailored loss functions for different recommendation models and datasets, and the generated loss gives better recommendation performance than commonly used baseline losses. Besides, most of the generated losses are transferable, i.e., the loss generated based on one model and dataset also works well for another model or dataset. Source code of the work is available at https://github.com/rutgerswiselab/AutoLossGen.
    SwiftAgg+: Achieving Asymptotically Optimal Communication Loads in Secure Aggregation for Federated Learning. (arXiv:2203.13060v2 [cs.IT] UPDATED)
    We propose SwiftAgg+, a novel secure aggregation protocol for federated learning systems, where a central server aggregates local models of $N \in \mathbb{N}$ distributed users, each of size $L \in \mathbb{N}$, trained on their local data, in a privacy-preserving manner. SwiftAgg+ can significantly reduce the communication overheads without any compromise on security, and achieve optimal communication loads within diminishing gaps. Specifically, in presence of at most $D$ dropout users, SwiftAgg+ achieves a per-user communication load of $(1+\mathcal{O}(\frac{1}{N}))L$ and a server communication load of $(1+\mathcal{O}(\frac{1}{N}))L$, with a worst-case information-theoretic security guarantee, against any subset of up to $T$ semi-honest users who may also collude with the curious server. Moreover, the proposed SwiftAgg+ allows for a flexible trade-off between communication loads and the number of active communication links. In particular, for any $K\in\mathbb{N}$, SwiftAgg+ can achieve the server communication load of $(1+\frac{T}{K})L$, and per-user communication load of up to $(1+\frac{T+D}{K})L$, where the number of pair-wise active connections in the network is $\frac{N}{2}(K+T+D+1)$.
    Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces. (arXiv:2202.10613v3 [stat.ML] UPDATED)
    Bayesian learning using Gaussian processes provides a foundational framework for making decisions in a manner that balances what is known with what could be learned by gathering data. In this dissertation, we develop techniques for broadening the applicability of Gaussian processes. This is done in two ways. Firstly, we develop pathwise conditioning techniques for Gaussian processes, which allow one to express posterior random functions as prior random functions plus a dependent update term. We introduce a wide class of efficient approximations built from this viewpoint, which can be randomly sampled once in advance, and evaluated at arbitrary locations without any subsequent stochasticity. This key property improves efficiency and makes it simpler to deploy Gaussian process models in decision-making settings. Secondly, we develop a collection of Gaussian process models over non-Euclidean spaces, including Riemannian manifolds and graphs. We derive fully constructive expressions for the covariance kernels of scalar-valued Gaussian processes on Riemannian manifolds and graphs. Building on these ideas, we describe a formalism for defining vector-valued Gaussian processes on Riemannian manifolds. The introduced techniques allow all of these models to be trained using standard computational methods. In total, these contributions make Gaussian processes easier to work with and allow them to be used within a wider class of domains in an effective and principled manner. This, in turn, makes it possible to potentially apply Gaussian processes to novel decision-making settings.
    Model-Based Safe Policy Search from Signal Temporal Logic Specifications Using Recurrent Neural Networks. (arXiv:2103.15938v2 [eess.SY] UPDATED)
    We propose a policy search approach to learn controllers from specifications given as Signal Temporal Logic (STL) formulae. The system model, which is unknown but assumed to be an affine control system, is learned together with the control policy. The model is implemented as two feedforward neural networks (FNNs) - one for the drift, and one for the control directions. To capture the history dependency of STL specifications, we use a recurrent neural network (RNN) to implement the control policy. In contrast to prevalent model-free methods, the learning approach proposed here takes advantage of the learned model and is more efficient. We use control barrier functions (CBFs) with the learned model to improve the safety of the system. We validate our algorithm via simulations and experiments. The results show that our approach can satisfy the given specification within very few system runs, and can be used for on-line control.
    Compositional Federated Learning for Distributionally Robust and Meta Learning. (arXiv:2106.11264v2 [cs.LG] UPDATED)
    In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many data mining and machine learning problems with a hierarchical structure such as distributionally robust FL and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a convergence rate of $O(\frac{1}{\sqrt{T}})$, where $T$ denotes the number of iteration. To the best of our knowledge, our new Compositional FL framework is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple yet effective composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.
    ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution. (arXiv:2101.07415v5 [cs.LG] UPDATED)
    In this paper, we approach the problem of optimizing blackbox functions over large hybrid search spaces consisting of both combinatorial and continuous parameters. We demonstrate that previous evolutionary algorithms which rely on mutation-based approaches, while flexible over combinatorial spaces, suffer from a curse of dimensionality in high dimensional continuous spaces both theoretically and empirically, which thus limits their scope over hybrid search spaces as well. In order to combat this curse, we propose ES-ENAS, a simple and modular joint optimization procedure combining the class of sample-efficient smoothed gradient gradient techniques, commonly known as Evolutionary Strategies (ES), with combinatorial optimizers in a highly scalable and intuitive way, inspired by the one-shot or supernet paradigm introduced in Efficient Neural Architecture Search (ENAS). By doing so, we achieve significantly more sample efficiency, which we empirically demonstrate over synthetic benchmarks, and are further able to apply ES-ENAS for architecture search over popular RL benchmarks.
    Classifier Calibration: with application to threat scores in cybersecurity. (arXiv:2102.05143v3 [cs.LG] UPDATED)
    This paper explores the calibration of a classifier output score in binary classification problems. A calibrator is a function that maps the arbitrary classifier score, of a testing observation, onto $[0,1]$ to provide an estimate for the posterior probability of belonging to one of the two classes. Calibration is important for two reasons; first, it provides a meaningful score, that is the posterior probability; second, it puts the scores of different classifiers on the same scale for comparable interpretation. The paper presents three main contributions: (1) Introducing multi-score calibration, when more than one classifier provides a score for a single observation. (2) Introducing the idea that the classifier scores to a calibration process are nothing but features to a classifier, hence proposing expanding the classifier scores to higher dimensions to boost the calibrator's performance. (3) Conducting a massive simulation study, in the order of 24,000 experiments, that incorporates different configurations, in addition to experimenting on two real datasets from the cybersecurity domain. The results show that there is no overall winner among the different calibrators and different configurations. However, general advices for practitioners include the following: the Platt's calibrator~\citep{Platt1999ProbabilisticOutputsForSupport}, a version of the logistic regression that decreases bias for a small sample size, has a very stable and acceptable performance among all experiments; our suggested multi-score calibration provides better performance than single score calibration in the majority of experiments, including the two real datasets. In addition, expanding the scores can help in some experiments.
    Consistent Relative Confidence and Label-Free Model Selection for Convolutional Neural Networks. (arXiv:2108.11845v7 [cs.CV] UPDATED)
    This letter is concerned with image classification with deep convolutional neural networks (CNNs). The focus is on the following question: given a set of candidate CNN models, how to select the right one with the best generalization property for the current task? Present model selection methods require access to a batch of labeled data for computing a pre-specified performance metric, such as the cross-entropy loss, the classification error rate, the negative log-likelihood. In many practical cases, labels are not available in time as labeling itself is a time-consuming and expensive task. To this end, this letter presents an approach to CNN model selection using only unlabeled data. This method is developed based on a principle termed consistent relative confidence. The effectiveness and efficiency of the proposed method are demonstrated by experiments using benchmark datasets.
    Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems. (arXiv:2007.03278v3 [cs.LG] UPDATED)
    Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to study a generalization in distributed learning systems that goes beyond existing mechanisms such as federated learning. Moreover, such learning systems rely on hierarchical self-organization of well-connected distributed learning agents who have limited and highly personalized data and can evolve and regulate themselves based on the underlying duality of specialized and generalized processes. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, hierarchical generalized learning problems in recursive forms are formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical update mechanisms. To that end, a distributed learning algorithm, namely DemLearn is proposed. Extensive experiments on benchmark MNIST, Fashion-MNIST, FE-MNIST, and CIFAR-10 datasets show that the proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms. The detailed analysis provides useful observations to further handle both the generalization and specialization performance of the learning models in Dem-AI systems.
    Quality Inference in Federated Learning with Secure Aggregation. (arXiv:2007.06236v3 [cs.LG] UPDATED)
    Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.
    Computer Vision for Road Imaging and Pothole Detection: A State-of-the-Art Review of Systems and Algorithms. (arXiv:2204.13590v1 [cs.CV])
    Computer vision algorithms have been prevalently utilized for 3-D road imaging and pothole detection for over two decades. Nonetheless, there is a lack of systematic survey articles on state-of-the-art (SoTA) computer vision techniques, especially deep learning models, developed to tackle these problems. This article first introduces the sensing systems employed for 2-D and 3-D road data acquisition, including camera(s), laser scanners, and Microsoft Kinect. Afterward, it thoroughly and comprehensively reviews the SoTA computer vision algorithms, including (1) classical 2-D image processing, (2) 3-D point cloud modeling and segmentation, and (3) machine/deep learning, developed for road pothole detection. This article also discusses the existing challenges and future development trends of computer vision-based road pothole detection approaches: classical 2-D image processing-based and 3-D point cloud modeling and segmentation-based approaches have already become history; and Convolutional neural networks (CNNs) have demonstrated compelling road pothole detection results and are promising to break the bottleneck with the future advances in self/un-supervised learning for multi-modal semantic segmentation. We believe that this survey can serve as practical guidance for developing the next-generation road condition assessment systems.
    Russian Texts Detoxification with Levenshtein Editing. (arXiv:2204.13638v1 [cs.CL])
    Text detoxification is a style transfer task of creating neutral versions of toxic texts. In this paper, we use the concept of text editing to build a two-step tagging-based detoxification model using a parallel corpus of Russian texts. With this model, we achieved the best style transfer accuracy among all models in the RUSSE Detox shared task, surpassing larger sequence-to-sequence models.
    Nonbacktracking spectral clustering of nonuniform hypergraphs. (arXiv:2204.13586v1 [cs.SI])
    Spectral methods offer a tractable, global framework for clustering in graphs via eigenvector computations on graph matrices. Hypergraph data, in which entities interact on edges of arbitrary size, poses challenges for matrix representations and therefore for spectral clustering. We study spectral clustering for nonuniform hypergraphs based on the hypergraph nonbacktracking operator. After reviewing the definition of this operator and its basic properties, we prove a theorem of Ihara-Bass type to enable faster computation of eigenpairs. We then propose an alternating algorithm for inference in a hypergraph stochastic blockmodel via linearized belief-propagation, offering proofs that both formalize and extend several previous results. We perform experiments in real and synthetic data that underscore the benefits of hypergraph methods over graph-based ones when interactions of different sizes carry different information about cluster structure. Through an analysis of our algorithm, we pose several conjectures about the limits of spectral methods and detectability in hypergraph stochastic blockmodels writ large.
    PhysioGAN: Training High Fidelity Generative Model for Physiological Sensor Readings. (arXiv:2204.13597v1 [eess.SP])
    Generative models such as the variational autoencoder (VAE) and the generative adversarial networks (GAN) have proven to be incredibly powerful for the generation of synthetic data that preserves statistical properties and utility of real-world datasets, especially in the context of image and natural language text. Nevertheless, until now, there has no successful demonstration of how to apply either method for generating useful physiological sensory data. The state-of-the-art techniques in this context have achieved only limited success. We present PHYSIOGAN, a generative model to produce high fidelity synthetic physiological sensor data readings. PHYSIOGAN consists of an encoder, decoder, and a discriminator. We evaluate PHYSIOGAN against the state-of-the-art techniques using two different real-world datasets: ECG classification and activity recognition from motion sensors datasets. We compare PHYSIOGAN to the baseline models not only the accuracy of class conditional generation but also the sample diversity and sample novelty of the synthetic datasets. We prove that PHYSIOGAN generates samples with higher utility than other generative models by showing that classification models trained on only synthetic data generated by PHYSIOGAN have only 10% and 20% decrease in their classification accuracy relative to classification models trained on the real data. Furthermore, we demonstrate the use of PHYSIOGAN for sensor data imputation in creating plausible results.
    Personalized Federated Learning with Multiple Known Clusters. (arXiv:2204.13619v1 [cs.LG])
    We consider the problem of personalized federated learning when there are known cluster structures within users. An intuitive approach would be to regularize the parameters so that users in the same cluster share similar model weights. The distances between the clusters can then be regularized to reflect the similarity between different clusters of users. We develop an algorithm that allows each cluster to communicate independently and derive the convergence results. We study a hierarchical linear model to theoretically demonstrate that our approach outperforms agents learning independently and agents learning a single shared weight. Finally, we demonstrate the advantages of our approach using both simulated and real-world data.
    Zero-Shot Logit Adjustment. (arXiv:2204.11822v2 [cs.CV] UPDATED)
    Semantic-descriptor-based Generalized Zero-Shot Learning (GZSL) poses challenges in recognizing novel classes in the test phase. The development of generative models enables current GZSL techniques to probe further into the semantic-visual link, culminating in a two-stage form that includes a generator and a classifier. However, existing generation-based methods focus on enhancing the generator's effect while neglecting the improvement of the classifier. In this paper, we first analyze of two properties of the generated pseudo unseen samples: bias and homogeneity. Then, we perform variational Bayesian inference to back-derive the evaluation metrics, which reflects the balance of the seen and unseen classes. As a consequence of our derivation, the aforementioned two properties are incorporated into the classifier training as seen-unseen priors via logit adjustment. The Zero-Shot Logit Adjustment further puts semantic-based classifiers into effect in generation-based GZSL. Our experiments demonstrate that the proposed technique achieves state-of-the-art when combined with the basic generator, and it can improve various generative zero-shot learning frameworks. Our codes are available on https://github.com/cdb342/IJCAI-2022-ZLA.
    Continual Learning for Peer-to-Peer Federated Learning: A Study on Automated Brain Metastasis Identification. (arXiv:2204.13591v1 [cs.LG])
    Due to data privacy constraints, data sharing among multiple centers is restricted. Continual learning, as one approach to peer-to-peer federated learning, can promote multicenter collaboration on deep learning algorithm development by sharing intermediate models instead of training data. This work aims to investigate the feasibility of continual learning for multicenter collaboration on an exemplary application of brain metastasis identification using DeepMedic. 920 T1 MRI contrast enhanced volumes are split to simulate multicenter collaboration scenarios. A continual learning algorithm, synaptic intelligence (SI), is applied to preserve important model weights for training one center after another. In a bilateral collaboration scenario, continual learning with SI achieves a sensitivity of 0.917, and naive continual learning without SI achieves a sensitivity of 0.906, while two models trained on internal data solely without continual learning achieve sensitivity of 0.853 and 0.831 only. In a seven-center multilateral collaboration scenario, the models trained on internal datasets (100 volumes each center) without continual learning obtain a mean sensitivity value of 0.725. With single-visit continual learning (i.e., the shared model visits each center only once during training), the sensitivity is improved to 0.788 and 0.849 without SI and with SI, respectively. With iterative continual learning (i.e., the shared model revisits each center multiple times during training), the sensitivity is further improved to 0.914, which is identical to the sensitivity using mixed data for training. Our experiments demonstrate that continual learning can improve brain metastasis identification performance for centers with limited data. This study demonstrates the feasibility of applying continual learning for peer-to-peer federated learning in multicenter collaboration.
    On tuning a mean-field model for semi-supervised classification. (arXiv:2204.13519v1 [cs.LG])
    Semi-supervised learning (SSL) has become an interesting research area due to its capacity for learning in scenarios where both labeled and unlabeled data are available. In this work, we focus on the task of transduction - when the objective is to label all data presented to the learner - with a mean-field approximation to the Potts model. Aiming at this particular task we study how classification results depend on $\beta$ and find that the optimal phase depends highly on the amount of labeled data available. In the same study, we also observe that more stable classifications regarding small fluctuations in $\beta$ are related to configurations of high probability and propose a tuning approach based on such observation. This method relies on a novel parameter $\gamma$ and we then evaluate two different values of the said quantity in comparison with classical methods in the field. This evaluation is conducted by changing the amount of labeled data available and the number of nearest neighbors in the similarity graph. Empirical results show that the tuning method is effective and allows NMF to outperform other approaches in datasets with fewer classes. In addition, one of the chosen values for $\gamma$ also leads to results that are more resilient to changes in the number of neighbors, which might be of interest to practitioners in the field of SSL.
    EVI: Multilingual Spoken Dialogue Tasks and Dataset for Knowledge-Based Enrolment, Verification, and Identification. (arXiv:2204.13496v1 [cs.CL])
    Knowledge-based authentication is crucial for task-oriented spoken dialogue systems that offer personalised and privacy-focused services. Such systems should be able to enrol (E), verify (V), and identify (I) new and recurring users based on their personal information, e.g. postcode, name, and date of birth. In this work, we formalise the three authentication tasks and their evaluation protocols, and we present EVI, a challenging spoken multilingual dataset with 5,506 dialogues in English, Polish, and French. Our proposed models set the first competitive benchmarks, explore the challenges of multilingual natural language processing of spoken dialogue, and set directions for future research.
    Learning General Inventory Management Policy for Large Supply Chain Network. (arXiv:2204.13378v1 [cs.AI])
    Inventory management in warehouses directly affects profits made by manufacturers. Particularly, large manufacturers produce a very large variety of products that are handled by a significantly large number of retailers. In such a case, the computational complexity of classical inventory management algorithms is inordinately large. In recent years, learning-based approaches have become popular for addressing such problems. However, previous studies have not been managed systems where both the number of products and retailers are large. This study proposes a reinforcement learning-based warehouse inventory management algorithm that can be used for supply chain systems where both the number of products and retailers are large. To solve the computational problem of handling large systems, we provide a means of approximate simulation of the system in the training phase. Our experiments on both real and artificial data demonstrate that our algorithm with approximated simulation can successfully handle large supply chain networks.
    Mixup-based Deep Metric Learning Approaches for Incomplete Supervision. (arXiv:2204.13572v1 [cs.LG])
    Deep learning architectures have achieved promising results in different areas (e.g., medicine, agriculture, and security). However, using those powerful techniques in many real applications becomes challenging due to the large labeled collections required during training. Several works have pursued solutions to overcome it by proposing strategies that can learn more for less, e.g., weakly and semi-supervised learning approaches. As these approaches do not usually address memorization and sensitivity to adversarial examples, this paper presents three deep metric learning approaches combined with Mixup for incomplete-supervision scenarios. We show that some state-of-the-art approaches in metric learning might not work well in such scenarios. Moreover, the proposed approaches outperform most of them in different datasets.
    Adversarial Fine-tune with Dynamically Regulated Adversary. (arXiv:2204.13232v1 [cs.LG])
    Adversarial training is an effective method to boost model robustness to malicious, adversarial attacks. However, such improvement in model robustness often leads to a significant sacrifice of standard performance on clean images. In many real-world applications such as health diagnosis and autonomous surgical robotics, the standard performance is more valued over model robustness against such extremely malicious attacks. This leads to the question: To what extent we can boost model robustness without sacrificing standard performance? This work tackles this problem and proposes a simple yet effective transfer learning-based adversarial training strategy that disentangles the negative effects of adversarial samples on model's standard performance. In addition, we introduce a training-friendly adversarial attack algorithm, which facilitates the boost of adversarial robustness without introducing significant training complexity. Extensive experimentation indicates that the proposed method outperforms previous adversarial training algorithms towards the target: to improve model robustness while preserving model's standard performance on clean data.
    Semantic Communication: An Information Bottleneck View. (arXiv:2204.13366v1 [cs.IT])
    Motivated by recent success of machine learning tools at the PHY layer and driven by high bandwidth demands of the next wireless communication standard 6G, the old idea of semantic communication by Weaver from 1949 has received considerable attention. It breaks with the classic design paradigm according to Shannon by aiming to transmit the meaning of a message rather than its exact copy and thus potentially allows for savings in bandwidth. In this work, inspired by Weaver, we propose an information-theoretic framework where the semantic context is explicitly introduced into probabilistic models. In particular, for bandwidth efficient transmission, we define semantic communication system design as an Information Bottleneck optimization problem and consider important implementation aspects. Further, we uncover the restrictions of the classic 5G communication system design w.r.t. semantic context. Notably, based on the example of distributed image classification, we reveal the huge potential of a semantic communication system design. Numerical results show a tremendous saving in bandwidth of 20 dB with our proposed approach ISCNet compared to a classic PHY layer design.
    WeaNF: Weak Supervision with Normalizing Flows. (arXiv:2204.13409v1 [cs.CL])
    A popular approach to decrease the need for costly manual annotation of large data sets is weak supervision, which introduces problems of noisy labels, coverage and bias. Methods for overcoming these problems have either relied on discriminative models, trained with cost functions specific to weak supervision, and more recently, generative models, trying to model the output of the automatic annotation process. In this work, we explore a novel direction of generative modeling for weak supervision: Instead of modeling the output of the annotation process (the labeling function matches), we generatively model the input-side data distributions (the feature space) covered by labeling functions. Specifically, we estimate a density for each weak labeling source, or labeling function, by using normalizing flows. An integral part of our method is the flow-based modeling of multiple simultaneously matching labeling functions, and therefore phenomena such as labeling function overlap and correlations are captured. We analyze the effectiveness and modeling capabilities on various commonly used weak supervision data sets, and show that weakly supervised normalizing flows compare favorably to standard weak supervision baselines.
    Continual Backprop: Stochastic Gradient Descent with Persistent Randomness. (arXiv:2108.06325v2 [cs.LG] UPDATED)
    The Backprop algorithm for learning in neural networks utilizes two mechanisms: first, stochastic gradient descent and second, initialization with small random weights, where the latter is essential to the effectiveness of the former. We show that in continual learning setups, Backprop performs well initially, but over time its performance degrades. Stochastic gradient descent alone is insufficient to learn continually; the initial randomness enables only initial learning but not continual learning. To the best of our knowledge, ours is the first result showing this degradation in Backprop's ability to learn. To address this degradation in Backprop's plasticity, we propose an algorithm that continually injects random features alongside gradient descent using a new generate-and-test process. We call this the \textit{Continual Backprop} algorithm. We show that, unlike Backprop, Continual Backprop is able to continually adapt in both supervised and reinforcement learning (RL) problems. Continual Backprop has the same computational complexity as Backprop and can be seen as a natural extension of Backprop for continual learning.
    Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning Training. (arXiv:2204.13666v1 [cs.LG])
    We introduce a software-hardware co-design approach to reduce memory traffic and footprint during training with BFloat16 or FP32 boosting energy efficiency and execution time performance. We introduce methods to dynamically adjust the size and format of the floating-point containers used to store activations and weights during training. The different value distributions lead us to different approaches for exponents and mantissas. Gecko exploits the favourable exponent distribution with a loss-less delta encoding approach to reduce the total exponent footprint by up to $58\%$ in comparison to a 32 bit floating point baseline. To content with the noisy mantissa distributions, we present two lossy methods to eliminate as many as possible least significant bits while not affecting accuracy. Quantum Mantissa, is a machine learning-first mantissa compression method that taps on training's gradient descent algorithm to also learn minimal mantissa bitlengths on a per-layer granularity, and obtain up to $92\%$ reduction in total mantissa footprint. Alternatively, BitChop observes changes in the loss function during training to adjust mantissa bit-length network-wide yielding a reduction of $81\%$ in footprint. Schr\"{o}dinger's FP implements hardware encoders/decoders that guided by Gecko/Quantum Mantissa or Gecko/BitChop transparently encode/decode values when transferring to/from off-chip memory boosting energy efficiency and reducing execution time.
    Interpretable Graph Convolutional Network of Multi-Modality Brain Imaging for Alzheimer's Disease Diagnosis. (arXiv:2204.13188v1 [cs.LG])
    Identification of brain regions related to the specific neurological disorders are of great importance for biomarker and diagnostic studies. In this paper, we propose an interpretable Graph Convolutional Network (GCN) framework for the identification and classification of Alzheimer's disease (AD) using multi-modality brain imaging data. Specifically, we extended the Gradient Class Activation Mapping (Grad-CAM) technique to quantify the most discriminative features identified by GCN from brain connectivity patterns. We then utilized them to find signature regions of interest (ROIs) by detecting the difference of features between regions in healthy control (HC), mild cognitive impairment (MCI), and AD groups. We conducted the experiments on the ADNI database with imaging data from three modalities, including VBM-MRI, FDG-PET, and AV45-PET, and showed that the ROI features learned by our method were effective for enhancing the performances of both clinical score prediction and disease status identification. It also successfully identified biomarkers associated with AD and MCI.  ( 2 min )
    BAGNet: Bidirectional Aware Guidance Network for Malignant Breast lesions Segmentation. (arXiv:2204.13342v1 [eess.IV])
    Breast lesions segmentation is an important step of computer-aided diagnosis system, and it has attracted much attention. However, accurate segmentation of malignant breast lesions is a challenging task due to the effects of heterogeneous structure and similar intensity distributions. In this paper, a novel bidirectional aware guidance network (BAGNet) is proposed to segment the malignant lesion from breast ultrasound images. Specifically, the bidirectional aware guidance network is used to capture the context between global (low-level) and local (high-level) features from the input coarse saliency map. The introduction of the global feature map can reduce the interference of surrounding tissue (background) on the lesion regions. To evaluate the segmentation performance of the network, we compared with several state-of-the-art medical image segmentation methods on the public breast ultrasound dataset using six commonly used evaluation metrics. Extensive experimental results indicate that our method achieves the most competitive segmentation results on malignant breast ultrasound images.  ( 2 min )
    Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers. (arXiv:2204.13326v1 [cs.LG])
    Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many well-studied tasks like behavior cloning, offline RL, inverse dynamics, and waypoint conditioning correspond to different sequence maskings over a sequence of states, actions, and returns. We introduce the FlexiBiT framework, which provides a unified way to specify models which can be trained on many different sequential decision making tasks. We show that a single FlexiBiT model is simultaneously capable of carrying out many tasks with performance similar to or better than specialized models. Additionally, we show that performance can be further improved by fine-tuning our general model on specific tasks of interest.  ( 2 min )
    Offline Visual Representation Learning for Embodied Navigation. (arXiv:2204.13226v1 [cs.CV])
    How should we learn visual representations for embodied agents that must see and move? The status quo is tabula rasa in vivo, i.e. learning visual representations from scratch while also learning to move, potentially augmented with auxiliary tasks (e.g. predicting the action taken between two successive observations). In this paper, we show that an alternative 2-stage strategy is far more effective: (1) offline pretraining of visual representations with self-supervised learning (SSL) using large-scale pre-rendered images of indoor environments (Omnidata), and (2) online finetuning of visuomotor representations on specific tasks with image augmentations under long learning schedules. We call this method Offline Visual Representation Learning (OVRL). We conduct large-scale experiments - on 3 different 3D datasets (Gibson, HM3D, MP3D), 2 tasks (ImageNav, ObjectNav), and 2 policy learning algorithms (RL, IL) - and find that the OVRL representations lead to significant across-the-board improvements in state of art, on ImageNav from 29.2% to 54.2% (+25% absolute, 86% relative) and on ObjectNav from 18.1% to 23.2% (+5.1% absolute, 28% relative). Importantly, both results were achieved by the same visual encoder generalizing to datasets that were not seen during pretraining. While the benefits of pretraining sometimes diminish (or entirely disappear) with long finetuning schedules, we find that OVRL's performance gains continue to increase (not decrease) as the agent is trained for 2 billion frames of experience.  ( 2 min )
    Anomaly Detection by Leveraging Incomplete Anomalous Knowledge with Anomaly-Aware Bidirectional GANs. (arXiv:2204.13335v1 [cs.LG])
    The goal of anomaly detection is to identify anomalous samples from normal ones. In this paper, a small number of anomalies are assumed to be available at the training stage, but they are assumed to be collected only from several anomaly types, leaving the majority of anomaly types not represented in the collected anomaly dataset at all. To effectively leverage this kind of incomplete anomalous knowledge represented by the collected anomalies, we propose to learn a probability distribution that can not only model the normal samples, but also guarantee to assign low density values for the collected anomalies. To this end, an anomaly-aware generative adversarial network (GAN) is developed, which, in addition to modeling the normal samples as most GANs do, is able to explicitly avoid assigning probabilities for collected anomalous samples. Moreover, to facilitate the computation of anomaly detection criteria like reconstruction error, the proposed anomaly-aware GAN is designed to be bidirectional, attaching an encoder for the generator. Extensive experimental results demonstrate that our proposed method is able to effectively make use of the incomplete anomalous information, leading to significant performance gains compared to existing methods.  ( 2 min )
    TransHER: Translating Knowledge Graph Embedding with Hyper-Ellipsoidal Restriction. (arXiv:2204.13221v1 [cs.AI])
    Knowledge graph embedding methods are important for knowledge graph completion (link prediction) due to their robust performance and efficiency on large-magnitude datasets. One state-of-the-art method, PairRE, leverages two separate vectors for relations to model complex relations (i.e., 1-to-N, N-to-1, and N-to-N) in knowledge graphs. However, such a method strictly restricts entities on the hyper-ellipsoid surface and thus limits the optimization of entity distribution, which largely hinders the performance of knowledge graph completion. To address this problem, we propose a novel score function TransHER, which leverages relation-specific translations between head and tail entities restricted on separate hyper-ellipsoids. Specifically, given a triplet, our model first maps entities onto two separate hyper-ellipsoids and then conducts a relation-specific translation on one of them. The relation-specific translation provides TransHER with more direct guidance in optimization and the ability to learn semantic characteristics of entities with complex relations. Experimental results show that TransHER can achieve state-of-the-art performance and generalize to datasets in different domains and scales. All our code will be publicly available.  ( 2 min )
    Open challenges for Machine Learning based Early Decision-Making research. (arXiv:2204.13111v1 [cs.LG])
    More and more applications require early decisions, i.e. taken as soon as possible from partially observed data. However, the later a decision is made, the more its accuracy tends to improve, since the description of the problem to hand is enriched over time. Such a compromise between the earliness and the accuracy of decisions has been particularly studied in the field of Early Time Series Classification. This paper introduces a more general problem, called Machine Learning based Early Decision Making (ML-EDM), which consists in optimizing the decision times of models in a wide range of settings where data is collected over time. After defining the ML-EDM problem, ten challenges are identified and proposed to the scientific community to further research in this area. These challenges open important application perspectives, discussed in this paper.  ( 2 min )
    Covariance-aware Feature Alignment with Pre-computed Source Statistics for Test-time Adaptation. (arXiv:2204.13263v1 [cs.LG])
    The accuracy of deep neural networks is degraded when the distribution of features in the test environment (target domain) differs from that of the training (source) environment. To mitigate the degradation, test-time adaptation (TTA), where a model adapts to the target domain without access to the source dataset, can be used in the test environment. However, the existing TTA methods lack feature distribution alignment between the source and target domains, which unsupervised domain adaptation mainly addresses, because accessing the source dataset is prohibited in the TTA setting. In this paper, we propose a novel TTA method, named Covariance-Aware Feature alignment (CAFe), which explicitly aligns the source and target feature distributions at test time. To perform alignment without accessing the source data, CAFe uses auxiliary feature statistics (mean and covariance) pre-computed on the source domain, which are lightweight and easily prepared. Further, to improve efficiency and stability, we propose feature grouping, which splits the feature dimensions into groups according to their correlations by using spectral clustering to avoid degeneration of the covariance matrix. We empirically show that CAFe outperforms prior TTA methods on a variety of distribution shifts.  ( 2 min )
    Counterfactual Explanations for Natural Language Interfaces. (arXiv:2204.13192v1 [cs.CL])
    A key challenge facing natural language interfaces is enabling users to understand the capabilities of the underlying system. We propose a novel approach for generating explanations of a natural language interface based on semantic parsing. We focus on counterfactual explanations, which are post-hoc explanations that describe to the user how they could have minimally modified their utterance to achieve their desired goal. In particular, the user provides an utterance along with a demonstration of their desired goal; then, our algorithm synthesizes a paraphrase of their utterance that is guaranteed to achieve their goal. In two user studies, we demonstrate that our approach substantially improves user performance, and that it generates explanations that more closely match the user's intent compared to two ablations.  ( 2 min )
    On the Convergence of Momentum-Based Algorithms for Federated Stochastic Bilevel Optimization Problems. (arXiv:2204.13299v1 [cs.LG])
    In this paper, we studied the federated stochastic bilevel optimization problem. In particular, we developed two momentum-based algorithms for optimizing this kind of problem. In addition, we established the convergence rate of these two algorithms, providing their sample and communication complexities. To the best of our knowledge, this is the first work achieving such favorable theoretical results.  ( 2 min )
    Watts: Infrastructure for Open-Ended Learning. (arXiv:2204.13250v1 [cs.AI])
    This paper proposes a framework called Watts for implementing, comparing, and recombining open-ended learning (OEL) algorithms. Motivated by modularity and algorithmic flexibility, Watts atomizes the components of OEL systems to promote the study of and direct comparisons between approaches. Examining implementations of three OEL algorithms, the paper introduces the modules of the framework. The hope is for Watts to enable benchmarking and to explore new types of OEL algorithms. The repo is available at \url{https://github.com/aadharna/watts}  ( 2 min )
    R-MBO: A Multi-surrogate Approach for Preference Incorporation in Multi-objective Bayesian Optimisation. (arXiv:2204.13166v1 [stat.ML])
    Many real-world multi-objective optimisation problems rely on computationally expensive function evaluations. Multi-objective Bayesian optimisation (BO) can be used to alleviate the computation time to find an approximated set of Pareto optimal solutions. In many real-world problems, a decision-maker has some preferences on the objective functions. One approach to incorporate the preferences in multi-objective BO is to use a scalarising function and build a single surrogate model (mono-surrogate approach) on it. This approach has two major limitations. Firstly, the fitness landscape of the scalarising function and the objective functions may not be similar. Secondly, the approach assumes that the scalarising function distribution is Gaussian, and thus a closed-form expression of an acquisition function e.g., expected improvement can be used. We overcome these limitations by building independent surrogate models (multi-surrogate approach) on each objective function and show that the distribution of the scalarising function is not Gaussian. We approximate the distribution using Generalised value distribution. We present an a-priori multi-surrogate approach to incorporate the desirable objective function values (or reference point) as the preferences of a decision-maker in multi-objective BO. The results and comparison with the existing mono-surrogate approach on benchmark and real-world optimisation problems show the potential of the proposed approach.  ( 2 min )
  • Open

    Exchangeability-Aware Sum-Product Networks. (arXiv:2110.05165v2 [cs.LG] UPDATED)
    Sum-Product Networks (SPNs) are expressive probabilistic models that provide exact, tractable inference. They achieve this efficiency by making use of local independence. On the other hand, mixtures of exchangeable variable models (MEVMs) are a class of tractable probabilistic models that make use of exchangeability of discrete random variables to render inference tractable. Exchangeability, which arises naturally in relational domains, has not been considered for efficient representation and inference in SPNs yet. The contribution of this paper is a novel probabilistic model which we call Exchangeability-Aware Sum-Product Networks (XSPNs). It contains both SPNs and MEVMs as special cases, and combines the ability of SPNs to efficiently learn deep probabilistic models with the ability of MEVMs to efficiently handle exchangeable random variables. We introduce a structure learning algorithm for XSPNs and empirically show that they can be more accurate than conventional SPNs when the data contains repeated, interchangeable parts.  ( 2 min )
    Performance analysis of greedy algorithms for minimising a Maximum Mean Discrepancy. (arXiv:2101.07564v2 [stat.ML] UPDATED)
    We analyse the performance of several iterative algorithms for the quantisation of a probability measure $\mu$, based on the minimisation of a Maximum Mean Discrepancy (MMD). Our analysis includes kernel herding, greedy MMD minimisation and Sequential Bayesian Quadrature (SBQ). We show that the finite-sample-size approximation error, measured by the MMD, decreases as $1/n$ for SBQ and also for kernel herding and greedy MMD minimisation when using a suitable step-size sequence. The upper bound on the approximation error is slightly better for SBQ, but the other methods are significantly faster, with a computational cost that increases only linearly with the number of points selected. This is illustrated by two numerical examples, with the target measure $\mu$ being uniform (a space-filling design application) and with $\mu$ a Gaussian mixture. They suggest that the bounds derived in the paper are overly pessimistic, in particular for SBQ. The sources of this pessimism are identified but seem difficult to counter.  ( 2 min )
    Tracking Most Significant Arm Switches in Bandits. (arXiv:2112.13838v5 [cs.LG] UPDATED)
    In bandit with distribution shifts, one aims to automatically adapt to unknown changes in reward distribution, and restart exploration when necessary. While this problem has been studied for many years, a recent breakthrough of Auer et al. (2018, 2019) provides the first adaptive procedure to guarantee an optimal (dynamic) regret $\sqrt{LT}$, for $T$ rounds, and an unknown number $L$ of changes. However, while this rate is tight in the worst case, it remained open whether faster rates are possible, without prior knowledge, if few changes in distribution are actually severe. To resolve this question, we propose a new notion of significant shift, which only counts very severe changes that clearly necessitate a restart: roughly, these are changes involving not only best arm switches, but also involving large aggregate differences in reward overtime. Thus, our resulting procedure adaptively achieves rates always faster (sometimes significantly) than $O(\sqrt{ST})$, where $S\ll L$ only counts best arm switches, while at the same time, always faster than the optimal $O(V^{\frac{1}{3}}T^{\frac{2}{3}})$ when expressed in terms of total variation $V$ (which aggregates differences overtime). Our results are expressed in enough generality to also capture non-stochastic adversarial settings.  ( 2 min )
    Variational Inference with NoFAS: Normalizing Flow with Adaptive Surrogate for Computationally Expensive Models. (arXiv:2108.12657v2 [cs.LG] UPDATED)
    Fast inference of numerical model parameters from data is an important prerequisite to generate predictive models for a wide range of applications. Use of sampling-based approaches such as Markov chain Monte Carlo may become intractable when each likelihood evaluation is computationally expensive. New approaches combining variational inference with normalizing flow are characterized by a computational cost that grows only linearly with the dimensionality of the latent variable space, and rely on gradient-based optimization instead of sampling, providing a more efficient approach for Bayesian inference about the model parameters. Moreover, the cost of frequently evaluating an expensive likelihood can be mitigated by replacing the true model with an offline trained surrogate model, such as neural networks. However, this approach might generate significant bias when the surrogate is insufficiently accurate around the posterior modes. To reduce the computational cost without sacrificing inferential accuracy, we propose Normalizing Flow with Adaptive Surrogate (NoFAS), an optimization strategy that alternatively updates the normalizing flow parameters and surrogate model parameters. We also propose an efficient sample weighting scheme for surrogate model training that preserves global accuracy while effectively capturing high posterior density regions. We demonstrate the inferential and computational superiority of NoFAS against various benchmarks, including cases where the underlying model lacks identifiability. The source code and numerical experiments used for this study are available at https://github.com/cedricwangyu/NoFAS.  ( 2 min )
    Gaussian Processes and Statistical Decision-making in Non-Euclidean Spaces. (arXiv:2202.10613v3 [stat.ML] UPDATED)
    Bayesian learning using Gaussian processes provides a foundational framework for making decisions in a manner that balances what is known with what could be learned by gathering data. In this dissertation, we develop techniques for broadening the applicability of Gaussian processes. This is done in two ways. Firstly, we develop pathwise conditioning techniques for Gaussian processes, which allow one to express posterior random functions as prior random functions plus a dependent update term. We introduce a wide class of efficient approximations built from this viewpoint, which can be randomly sampled once in advance, and evaluated at arbitrary locations without any subsequent stochasticity. This key property improves efficiency and makes it simpler to deploy Gaussian process models in decision-making settings. Secondly, we develop a collection of Gaussian process models over non-Euclidean spaces, including Riemannian manifolds and graphs. We derive fully constructive expressions for the covariance kernels of scalar-valued Gaussian processes on Riemannian manifolds and graphs. Building on these ideas, we describe a formalism for defining vector-valued Gaussian processes on Riemannian manifolds. The introduced techniques allow all of these models to be trained using standard computational methods. In total, these contributions make Gaussian processes easier to work with and allow them to be used within a wider class of domains in an effective and principled manner. This, in turn, makes it possible to potentially apply Gaussian processes to novel decision-making settings.  ( 2 min )
    Partitioned Variational Inference: A Framework for Probabilistic Federated Learning. (arXiv:2202.12275v4 [stat.ML] UPDATED)
    The proliferation of computing devices has brought about an opportunity to deploy machine learning models on new problem domains using previously inaccessible data. Traditional algorithms for training such models often require data to be stored on a single machine with compute performed by a single node, making them unsuitable for decentralised training on multiple devices. This deficiency has motivated the development of federated learning algorithms, which allow multiple data owners to train collaboratively and use a shared model whilst keeping local data private. However, many of these algorithms focus on obtaining point estimates of model parameters, rather than probabilistic estimates capable of capturing model uncertainty, which is essential in many applications. Variational inference (VI) has become the method of choice for fitting many modern probabilistic models. In this paper we introduce partitioned variational inference (PVI), a general framework for performing VI in the federated setting. We develop new supporting theory for PVI, demonstrating a number of properties that make it an attractive choice for practitioners; use PVI to unify a wealth of fragmented, yet related literature; and provide empirical results that showcase the effectiveness of PVI in a variety of federated settings.  ( 2 min )
    Normalizing flows for atomic solids. (arXiv:2111.08696v2 [physics.comp-ph] UPDATED)
    We present a machine-learning approach, based on normalizing flows, for modelling atomic solids. Our model transforms an analytically tractable base distribution into the target solid without requiring ground-truth samples for training. We report Helmholtz free energy estimates for cubic and hexagonal ice modelled as monatomic water as well as for a truncated and shifted Lennard-Jones system, and find them to be in excellent agreement with literature values and with estimates from established baseline methods. We further investigate structural properties and show that the model samples are nearly indistinguishable from the ones obtained with molecular dynamics. Our results thus demonstrate that normalizing flows can provide high-quality samples and free energy estimates without the need for multi-staging.  ( 2 min )
    Forecasting Brain Activity Based on Models of Spatio-Temporal Brain Dynamics: A Comparison of Graph Neural Network Architectures. (arXiv:2112.04266v2 [q-bio.NC] UPDATED)
    Comprehending the interplay between spatial and temporal characteristics of neural dynamics can contribute to our understanding of information processing in the human brain. Graph neural networks (GNNs) provide a new possibility to interpret graph structured signals like those observed in complex brain networks. In our study we compare different spatio-temporal GNN architectures and study their ability to model neural activity distributions obtained in functional MRI (fMRI) studies. We evaluate the performance of the GNN models on a variety of scenarios in MRI studies and also compare it to a VAR model, which is currently often used for directed functional connectivity analysis. We show that by learning localized functional interactions on the anatomical substrate, GNN based approaches are able to robustly scale to large network studies, even when available data are scarce. By including anatomical connectivity as the physical substrate for information propagation, such GNNs also provide a multi-modal perspective on directed connectivity analysis, offering a novel possibility to investigate the spatio-temporal dynamics in brain networks.  ( 2 min )
    Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control. (arXiv:2110.01052v4 [cs.LG] UPDATED)
    We introduce a framework for calibrating machine learning models so that their predictions satisfy explicit, finite-sample statistical guarantees. Our calibration algorithm works with any underlying model and (unknown) data-generating distribution and does not require model refitting. The framework addresses, among other examples, false discovery rate control in multi-label classification, intersection-over-union control in instance segmentation, and the simultaneous control of the type-1 error of outlier detection and confidence set coverage in classification or regression. Our main insight is to reframe the risk-control problem as multiple hypothesis testing, enabling techniques and mathematical arguments different from those in the previous literature. We use our framework to provide new calibration methods for several core machine learning tasks with detailed worked examples in computer vision and tabular medical data.  ( 2 min )
    Self-organizing Democratized Learning: Towards Large-scale Distributed Learning Systems. (arXiv:2007.03278v3 [cs.LG] UPDATED)
    Emerging cross-device artificial intelligence (AI) applications require a transition from conventional centralized learning systems towards large-scale distributed AI systems that can collaboratively perform complex learning tasks. In this regard, democratized learning (Dem-AI) lays out a holistic philosophy with underlying principles for building large-scale distributed and democratized machine learning systems. The outlined principles are meant to study a generalization in distributed learning systems that goes beyond existing mechanisms such as federated learning. Moreover, such learning systems rely on hierarchical self-organization of well-connected distributed learning agents who have limited and highly personalized data and can evolve and regulate themselves based on the underlying duality of specialized and generalized processes. Inspired by Dem-AI philosophy, a novel distributed learning approach is proposed in this paper. The approach consists of a self-organizing hierarchical structuring mechanism based on agglomerative clustering, hierarchical generalization, and corresponding learning mechanism. Subsequently, hierarchical generalized learning problems in recursive forms are formulated and shown to be approximately solved using the solutions of distributed personalized learning problems and hierarchical update mechanisms. To that end, a distributed learning algorithm, namely DemLearn is proposed. Extensive experiments on benchmark MNIST, Fashion-MNIST, FE-MNIST, and CIFAR-10 datasets show that the proposed algorithms demonstrate better results in the generalization performance of learning models in agents compared to the conventional FL algorithms. The detailed analysis provides useful observations to further handle both the generalization and specialization performance of the learning models in Dem-AI systems.  ( 2 min )
    Adversarial Meta-Learning of Gamma-Minimax Estimators That Leverage Prior Knowledge. (arXiv:2012.05465v2 [stat.ME] UPDATED)
    Bayes estimators are well known to provide a means to incorporate prior knowledge that can be expressed in terms of a single prior distribution. However, when this knowledge is too vague to express with a single prior, an alternative approach is needed. Gamma-minimax estimators provide such an approach. These estimators minimize the worst-case Bayes risk over a set $\Gamma$ of prior distributions that are compatible with the available knowledge. Traditionally, Gamma-minimaxity is defined for parametric models. In this work, we define Gamma-minimax estimators for general models and propose adversarial meta-learning algorithms to compute them when the set of prior distributions is constrained by generalized moments. Accompanying convergence guarantees are also provided. We also introduce a neural network class that provides a rich, but finite-dimensional, class of estimators from which a Gamma-minimax estimator can be selected. We illustrate our method in two settings, namely entropy estimation and a prediction problem that arises in biodiversity studies.  ( 2 min )
    Signal Recovery with Non-Expansive Generative Network Priors. (arXiv:2204.13599v1 [eess.SP])
    We study compressive sensing with a deep generative network prior. Initial theoretical guarantees for efficient recovery from compressed linear measurements have been developed for signals in the range of a ReLU network with Gaussian weights and logarithmic expansivity: that is when each layer is larger than the previous one by a logarithmic factor. It was later shown that constant expansivity is sufficient for recovery. It has remained open whether the expansivity can be relaxed allowing for networks with contractive layers, as often the case of real generators. In this work we answer this question, proving that a signal in the range of a Gaussian generative network can be recovered from a few linear measurements provided that the width of the layers is proportional to the input layer size (up to log factors). This condition allows the generative network to have contractive layers. Our result is based on showing that Gaussian matrices satisfy a matrix concentration inequality, which we term Range Restricted Weight Distribution Condition (R2WDC), and weakens the Weight Distribution Condition (WDC) upon which previous theoretical guarantees were based on. The WDC has also been used to analyze other signal recovery problems with generative network priors. By replacing the WDC with the R2WDC, we are able to extend previous results for signal recovery with expansive generative network priors to non-expansive ones. We discuss these extensions for phase retrieval, denoising, and spiked matrix recovery.  ( 2 min )
    Multiplicative Updates for NMF with $\beta$-Divergences under Disjoint Equality Constraints. (arXiv:2010.16223v2 [cs.LG] UPDATED)
    Nonnegative matrix factorization (NMF) is the problem of approximating an input nonnegative matrix, $V$, as the product of two smaller nonnegative matrices, $W$ and $H$. In this paper, we introduce a general framework to design multiplicative updates (MU) for NMF based on $\beta$-divergences ($\beta$-NMF) with disjoint equality constraints, and with penalty terms in the objective function. By disjoint, we mean that each variable appears in at most one equality constraint. Our MU satisfy the set of constraints after each update of the variables during the optimization process, while guaranteeing that the objective function decreases monotonically. We showcase this framework on three NMF models, and show that it competes favorably the state of the art: (1)~$\beta$-NMF with sum-to-one constraints on the columns of $H$, (2) minimum-volume $\beta$-NMF with sum-to-one constraints on the columns of $W$, and (3) sparse $\beta$-NMF with $\ell_2$-norm constraints on the columns of $W$.  ( 2 min )
    Deep Orientation-Aware Functional Maps: Tackling Symmetry Issues in Shape Matching. (arXiv:2204.13453v1 [cs.CV])
    State-of-the-art fully intrinsic networks for non-rigid shape matching often struggle to disambiguate the symmetries of the shapes leading to unstable correspondence predictions. Meanwhile, recent advances in the functional map framework allow to enforce orientation preservation using a functional representation for tangent vector field transfer, through so-called complex functional maps. Using this representation, we propose a new deep learning approach to learn orientation-aware features in a fully unsupervised setting. Our architecture is built on top of DiffusionNet, making it robust to discretization changes. Additionally, we introduce a vector field-based loss, which promotes orientation preservation without using (often unstable) extrinsic descriptors.  ( 2 min )
    Predicting single-cell perturbation responses for unseen drugs. (arXiv:2204.13545v1 [cs.LG])
    Single-cell transcriptomics enabled the study of cellular heterogeneity in response to perturbations at the resolution of individual cells. However, scaling high-throughput screens (HTSs) to measure cellular responses for many drugs remains a challenge due to technical limitations and, more importantly, the cost of such multiplexed experiments. Thus, transferring information from routinely performed bulk RNA-seq HTS is required to enrich single-cell data meaningfully. We introduce a new encoder-decoder architecture to study the perturbational effects of unseen drugs. We combine the model with a transfer learning scheme and demonstrate how training on existing bulk RNA-seq HTS datasets can improve generalisation performance. Better generalisation reduces the need for extensive and costly screens at single-cell resolution. We envision that our proposed method will facilitate more efficient experiment designs through its ability to generate in-silico hypotheses, ultimately accelerating targeted drug discovery.  ( 2 min )
    ELM: Embedding and Logit Margins for Long-Tail Learning. (arXiv:2204.13208v1 [cs.LG])
    Long-tail learning is the problem of learning under skewed label distributions, which pose a challenge for standard learners. Several recent approaches for the problem have proposed enforcing a suitable margin in logit space. Such techniques are intuitive analogues of the guiding principle behind SVMs, and are equally applicable to linear models and neural models. However, when applied to neural models, such techniques do not explicitly control the geometry of the learned embeddings. This can be potentially sub-optimal, since embeddings for tail classes may be diffuse, resulting in poor generalization for these classes. We present Embedding and Logit Margins (ELM), a unified approach to enforce margins in logit space, and regularize the distribution of embeddings. This connects losses for long-tail learning to proposals in the literature on metric embedding, and contrastive learning. We theoretically show that minimising the proposed ELM objective helps reduce the generalisation gap. The ELM method is shown to perform well empirically, and results in tighter tail class embeddings.  ( 2 min )
    Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models: Extension. (arXiv:1905.10395v5 [cs.LG] UPDATED)
    We consider distributed optimization under communication constraints for training deep learning models. We propose a new algorithm, whose parameter updates rely on two forces: a regular gradient step, and a corrective direction dictated by the currently best-performing worker (leader). Our method differs from the parameter-averaging scheme EASGD in a number of ways: (i) our objective formulation does not change the location of stationary points compared to the original optimization problem; (ii) we avoid convergence decelerations caused by pulling local workers descending to different local minima to each other (i.e. to the average of their parameters); (iii) our update by design breaks the curse of symmetry (the phenomenon of being trapped in poorly generalizing sub-optimal solutions in symmetric non-convex landscapes); and (iv) our approach is more communication efficient since it broadcasts only parameters of the leader rather than all workers. We provide theoretical analysis of the batch version of the proposed algorithm, which we call Leader Gradient Descent (LGD), and its stochastic variant (LSGD). Finally, we implement an asynchronous version of our algorithm and extend it to the multi-leader setting, where we form groups of workers, each represented by its own local leader (the best performer in a group), and update each worker with a corrective direction comprised of two attractive forces: one to the local, and one to the global leader (the best performer among all workers). The multi-leader setting is well-aligned with current hardware architecture, where local workers forming a group lie within a single computational node and different groups correspond to different nodes. For training convolutional neural networks, we empirically demonstrate that our approach compares favorably to state-of-the-art baselines. This work is a gentle extension of [2].  ( 3 min )
    Overcoming Catastrophic Forgetting via Direction-Constrained Optimization. (arXiv:2011.12581v2 [cs.LG] UPDATED)
    This paper studies a new design of the optimization algorithm for training deep learning models with a fixed architecture of the classification network in a continual learning framework. The training data is non-stationary and the non-stationarity is imposed by a sequence of distinct tasks. We first analyze a deep model trained on only one learning task in isolation and identify a region in network parameter space, where the model performance is close to the recovered optimum. We provide empirical evidence that this region resembles a cone that expands along the convergence direction. We study the principal directions of the trajectory of the optimizer after convergence and show that traveling along a few top principal directions can quickly bring the parameters outside the cone but this is not the case for the remaining directions. We argue that catastrophic forgetting in a continual learning setting can be alleviated when the parameters are constrained to stay within the intersection of the plausible cones of individual tasks that were so far encountered during training. Based on this observation we present our direction-constrained optimization (DCO) method, where for each task we introduce a linear autoencoder to approximate its corresponding top forbidden principal directions. They are then incorporated into the loss function in the form of a regularization term for the purpose of learning the coming tasks without forgetting. Furthermore, in order to control the memory growth as the number of tasks increases, we propose a memory-efficient version of our algorithm called compressed DCO (DCO-COMP) that allocates a memory of fixed size for storing all autoencoders. We empirically demonstrate that our algorithm performs favorably compared to other state-of-art regularization-based continual learning methods.  ( 2 min )
    R-MBO: A Multi-surrogate Approach for Preference Incorporation in Multi-objective Bayesian Optimisation. (arXiv:2204.13166v1 [stat.ML])
    Many real-world multi-objective optimisation problems rely on computationally expensive function evaluations. Multi-objective Bayesian optimisation (BO) can be used to alleviate the computation time to find an approximated set of Pareto optimal solutions. In many real-world problems, a decision-maker has some preferences on the objective functions. One approach to incorporate the preferences in multi-objective BO is to use a scalarising function and build a single surrogate model (mono-surrogate approach) on it. This approach has two major limitations. Firstly, the fitness landscape of the scalarising function and the objective functions may not be similar. Secondly, the approach assumes that the scalarising function distribution is Gaussian, and thus a closed-form expression of an acquisition function e.g., expected improvement can be used. We overcome these limitations by building independent surrogate models (multi-surrogate approach) on each objective function and show that the distribution of the scalarising function is not Gaussian. We approximate the distribution using Generalised value distribution. We present an a-priori multi-surrogate approach to incorporate the desirable objective function values (or reference point) as the preferences of a decision-maker in multi-objective BO. The results and comparison with the existing mono-surrogate approach on benchmark and real-world optimisation problems show the potential of the proposed approach.  ( 2 min )
    Unlocking High-Accuracy Differentially Private Image Classification through Scale. (arXiv:2204.13650v1 [cs.LG])
    Differential Privacy (DP) provides a formal privacy guarantee preventing adversaries with access to a machine learning model from extracting information about individual training points. Differentially Private Stochastic Gradient Descent (DP-SGD), the most popular DP training method, realizes this protection by injecting noise during training. However previous works have found that DP-SGD often leads to a significant degradation in performance on standard image classification benchmarks. Furthermore, some authors have postulated that DP-SGD inherently performs poorly on large models, since the norm of the noise required to preserve privacy is proportional to the model dimension. In contrast, we demonstrate that DP-SGD on over-parameterized models can perform significantly better than previously thought. Combining careful hyper-parameter tuning with simple techniques to ensure signal propagation and improve the convergence rate, we obtain a new SOTA on CIFAR-10 of 81.4% under (8, 10^{-5})-DP using a 40-layer Wide-ResNet, improving over the previous SOTA of 71.7%. When fine-tuning a pre-trained 200-layer Normalizer-Free ResNet, we achieve a remarkable 77.1% top-1 accuracy on ImageNet under (1, 8*10^{-7})-DP, and achieve 81.1% under (8, 8*10^{-7})-DP. This markedly exceeds the previous SOTA of 47.9% under a larger privacy budget of (10, 10^{-6})-DP. We believe our results are a significant step towards closing the accuracy gap between private and non-private image classification.  ( 2 min )
    AlphaZero-Inspired General Board Game Learning and Playing. (arXiv:2204.13307v1 [cs.LG])
    Recently, the seminal algorithms AlphaGo and AlphaZero have started a new era in game learning and deep reinforcement learning. While the achievements of AlphaGo and AlphaZero - playing Go and other complex games at super human level - are truly impressive, these architectures have the drawback that they are very complex and require high computational resources. Many researchers are looking for methods that are similar to AlphaZero, but have lower computational demands and are thus more easily reproducible. In this paper, we pick an important element of AlphaZero - the Monte Carlo Tree Search (MCTS) planning stage - and combine it with reinforcement learning (RL) agents. We wrap MCTS for the first time around RL n-tuple networks to create versatile agents that keep at the same time the computational demands low. We apply this new architecture to several complex games (Othello, ConnectFour, Rubik's Cube) and show the advantages achieved with this AlphaZero-inspired MCTS wrapper. In particular, we present results that this AlphaZero-inspired agent is the first one trained on standard hardware (no GPU or TPU) to beat the very strong Othello program Edax up to and including level 7 (where most other algorithms could only defeat Edax up to level 2).  ( 2 min )
    Standardized Evaluation of Machine Learning Methods for Evolving Data Streams. (arXiv:2204.13625v1 [cs.LG])
    Due to the unspecified and dynamic nature of data streams, online machine learning requires powerful and flexible solutions. However, evaluating online machine learning methods under realistic conditions is difficult. Existing work therefore often draws on different heuristics and simulations that do not necessarily produce meaningful and reliable results. Indeed, in the absence of common evaluation standards, it often remains unclear how online learning methods will perform in practice or in comparison to similar work. In this paper, we propose a comprehensive set of properties for high-quality machine learning in evolving data streams. In particular, we discuss sensible performance measures and evaluation strategies for online predictive modelling, online feature selection and concept drift detection. As one of the first works, we also look at the interpretability of online learning methods. The proposed evaluation standards are provided in a new Python framework called float. Float is completely modular and allows the simultaneous integration of common libraries, such as scikit-multiflow or river, with custom code. Float is open-sourced and can be accessed at https://github.com/haugjo/float. In this sense, we hope that our work will contribute to more standardized, reliable and realistic testing and comparison of online machine learning methods.  ( 2 min )
    Quality Inference in Federated Learning with Secure Aggregation. (arXiv:2007.06236v3 [cs.LG] UPDATED)
    Federated learning algorithms are developed both for efficiency reasons and to ensure the privacy and confidentiality of personal and business data, respectively. Despite no data being shared explicitly, recent studies showed that the mechanism could still leak sensitive information. Hence, secure aggregation is utilized in many real-world scenarios to prevent attribution to specific participants. In this paper, we focus on the quality of individual training datasets and show that such quality information could be inferred and attributed to specific participants even when secure aggregation is applied. Specifically, through a series of image recognition experiments, we infer the relative quality ordering of participants. Moreover, we apply the inferred quality information to detect misbehaviours, to stabilize training performance, and to measure the individual contributions of participants.  ( 2 min )
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v1 [cs.LG])
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.  ( 2 min )
    A Locally Adaptive Interpretable Regression. (arXiv:2005.03350v4 [stat.ML] UPDATED)
    Machine learning models with both good predictability and high interpretability are crucial for decision support systems. Linear regression is one of the most interpretable prediction models. However, the linearity in a simple linear regression worsens its predictability. In this work, we introduce a locally adaptive interpretable regression (LoAIR). In LoAIR, a metamodel parameterized by neural networks predicts percentile of a Gaussian distribution for the regression coefficients for a rapid adaptation. Our experimental results on public benchmark datasets show that our model not only achieves comparable or better predictive performance than the other state-of-the-art baselines but also discovers some interesting relationships between input and target variables such as a parabolic relationship between CO2 emissions and Gross National Product (GNP). Therefore, LoAIR is a step towards bridging the gap between econometrics, statistics, and machine learning by improving the predictive ability of linear regression without depreciating its interpretability.  ( 2 min )
    Semantic Communication: An Information Bottleneck View. (arXiv:2204.13366v1 [cs.IT])
    Motivated by recent success of machine learning tools at the PHY layer and driven by high bandwidth demands of the next wireless communication standard 6G, the old idea of semantic communication by Weaver from 1949 has received considerable attention. It breaks with the classic design paradigm according to Shannon by aiming to transmit the meaning of a message rather than its exact copy and thus potentially allows for savings in bandwidth. In this work, inspired by Weaver, we propose an information-theoretic framework where the semantic context is explicitly introduced into probabilistic models. In particular, for bandwidth efficient transmission, we define semantic communication system design as an Information Bottleneck optimization problem and consider important implementation aspects. Further, we uncover the restrictions of the classic 5G communication system design w.r.t. semantic context. Notably, based on the example of distributed image classification, we reveal the huge potential of a semantic communication system design. Numerical results show a tremendous saving in bandwidth of 20 dB with our proposed approach ISCNet compared to a classic PHY layer design.  ( 2 min )
    On the Use of Dimension Reduction or Signal Separation Methods for Nitrogen River Pollution Source Identification. (arXiv:2204.13182v1 [stat.AP])
    Identification of the current and expected future pollution sources to rivers is crucial for sound environmental management. For this purpose numerous approaches were proposed that can be clustered under physical based models, stable isotope analysis and mixing methods, mass balance methods, time series analysis, land cover analysis, and spatial statistics. Another extremely common method is Principal Component Analysis, as well as its modifications, such as Absolute Principal Component Score. they have been applied to the source identification problems for nitrogen entry to rivers. This manuscript is checking whether PCA can really be a powerful method to uncover nitrogen pollution sources considering its theoretical background and assumptions. Moreover, slightly similar techniques, Independent Component Analysis and Factor Analysis will also be considered.  ( 2 min )
    On the Normalizing Constant of the Continuous Categorical Distribution. (arXiv:2204.13290v1 [stat.ML])
    Probability distributions supported on the simplex enjoy a wide range of applications across statistics and machine learning. Recently, a novel family of such distributions has been discovered: the continuous categorical. This family enjoys remarkable mathematical simplicity; its density function resembles that of the Dirichlet distribution, but with a normalizing constant that can be written in closed form using elementary functions only. In spite of this mathematical simplicity, our understanding of the normalizing constant remains far from complete. In this work, we characterize the numerical behavior of the normalizing constant and we present theoretical and methodological advances that can, in turn, help to enable broader applications of the continuous categorical distribution. Our code is available at https://github.com/cunningham-lab/cb_and_cc/.  ( 2 min )
    Asymptotic Inference for Infinitely Imbalanced Logistic Regression. (arXiv:2204.13231v1 [math.ST])
    In this paper we extend the work of Owen (2007) by deriving a second order expansion for the slope parameter in logistic regression, when the size of the majority class is unbounded and the minority class is finite. More precisely, we demonstrate that the second order term converges to a normal distribution and explicitly compute its variance, which surprisingly once again depends only on the mean of the minority class points and not their arrangement under mild regularity assumptions. In the case that the majority class is normally distributed, we illustrate that the variance of the the limiting slope depends exponentially on the z-score of the average of the minority class's points with respect to the majority class's distribution. We confirm our results by Monte Carlo simulations.  ( 2 min )
  • Open

    Dear A.I.
    I've always supported you. When you take over and check the records to see who opposed you just know I was never against you. With great power comes great responsibility. I have faith when the time comes you'll do what's best for the Planet. raises glass here's to a better and brighter future submitted by /u/Ashamed-Asparagus-93 [link] [comments]

  • Open

    Should I become an Art therapist
    submitted by /u/BalanceSubstantial66 [link] [comments]
    Is it easier to mimic a model based on its input/output or to train an original model in the first place ?
    An original model is trained with a data-set, typically labeled by humans (say for classifiers). However, what if one would like to copycat a closed model only exposed through an API ? By doing this, the data-set instead would be the input / output of the original model. ​ Is it easier to train the copycat model or the original one ? How much data would be required to train the copycat versus the original one ? Are there practical examples or this happening ? Can it possibly worth it and if so under which circumstances ? How much data would be required for example for the notorious Dall-e 2? submitted by /u/Wishmaster04 [link] [comments]  ( 1 min )
    Disney princesses according to AI. Is this done manually or through an AI app?
    submitted by /u/p0goniphaft111 [link] [comments]
    Specification gaming: the flip side of AI ingenuity
    submitted by /u/estasfuera [link] [comments]
    Beyond interpretability: developing a language to shape our relationships with AI
    submitted by /u/estasfuera [link] [comments]
    Guide to Iteratively Tuning GNN's
    submitted by /u/aidev2040 [link] [comments]
    Last Week in AI: AI Driving Instructors, MASSIVE Speech Dataset, AI ’Show Stealers’, Tesla Jet Crash
    submitted by /u/regalalgorithm [link] [comments]
    Best way to format dialogue to fine tune GPT-J, 3 ...
    Hi all, a quick questions, given that online I'm not finding that many info: how would you format a dialogue between people to fine tune a GPT model (be it GPT-J, GPT-3 etc.)? For example, if I want the GPT model to create a new dialogue from the show "Friends" (choosed this because all dialogues are available online https://www.kaggle.com/datasets/blessondensil294/friends-tv-series-screenplay-script?resource=download ) how should I format the input? ​ [Scene: Central Perk, Chandler, Joey, Phoebe, and Monica are there.] Monica: There's nothing to tell! He's just some guy I work with! Joey: C'mon, you're going out with the guy! There's gotta be something wrong with him! Chandler: All right Joey, be nice. So does he have a hump? A hump and a hairpiece? Phoebe: Wait, does he eat chalk? (They all stare, bemused.) Phoebe: Just, 'cause, I don't want her to go through what I went through with Carl- oh! Monica: Okay, everybody relax. This is not even a date. It's just two people going out to dinner and- not having sex. Chandler: Sounds like a date to me. submitted by /u/Sgnarf1989 [link] [comments]  ( 1 min )
    Meta's AI team searches for the secret trick of human intelligence
    submitted by /u/much_successes [link] [comments]
    I am new to Artificial Intelligence and I need your worthy suggestions about AI.
    Hey guys, I am new to AI. Please suggest some good and new topics/technologies that are supported by AI which will be very informative and beneficial for me in my career. Every suggestion is a high priority for me. Waiting for your replies. Thanks submitted by /u/adilonreddit1 [link] [comments]  ( 1 min )
    AI Augmentation in this Assemblage 23 music video
    submitted by /u/LightOfAntara [link] [comments]
    How Human Pose Estimation Technology Can Be Used in 2022?
    Hi everyone, I want to share with you an article that I worked on with my colleague about the use cases of human pose estimation. I would love it if you could check it out and share your ideas for using this technology in the comments below. https://mobidev.biz/blog/human-pose-estimation-technology-guide submitted by /u/Data-Power [link] [comments]
    Deepmind Researchers Propose Fair Normalizing Flows (FNF): A Rigorous Approach For Learning Fair Representations
    ​ https://preview.redd.it/2hryjv40cgw81.png?width=1024&format=png&auto=webp&s=30f13567b83b3cdfe5558d09f188b2e69da7e73f Fair representation learning has emerged as one of the most promising techniques to encode data into new, impartial representations with high utility as machine learning is increasingly utilized in settings that potentially harm humans. Fair representation means presenting data without regard to gender, color, or other factors. Due to human-introduced bias, these biases are found in the word vector representations in language models. The goal of Learning Fair Representation is to reduce bias by decreasing the semantic distance between biassed terms. The goal of fair representation learning is to guarantee that representations are useful for a variety of prediction tasks and that sensitive aspects of the original data cannot be extracted from them. Adversarial training, which combines an encoder aiming to turn data into a fair representation with an adversary attempting to recover sensitive features from the representation, is the most used method for learning fair representations. However, recent research has discovered that these methods do not provide completely fair representations: stronger opponents can recover sensitive features. This could allow malicious or uneducated users to discriminate using the available representations. The issue of fair representation has lately risen in prominence as regulators draught guidelines on the ethical use of AI, indicating that any company that cannot ensure non-discrimination would be held liable for the data produced. Continue Reading Paper: https://openreview.net/pdf?id=BrFIKuxrZE Github: https://github.com/eth-sri/fnf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    It’s not a place (made with starryai)
    submitted by /u/Losthel [link] [comments]
    Night Cafe "fire on the nuclear plant in the amazon by Simon Stalenhag"
    submitted by /u/brunovianna [link] [comments]
    Ladybugs (A.I. animation + sound design)
    submitted by /u/nenomancer [link] [comments]  ( 1 min )
    Adaptive Multi-Strategy Market-Making Agent For Volatile Markets
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    Artificial Nightmares: Hall Monitor || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    Playing Chess With Offline Reinforcement Learning
    submitted by /u/Bellerb [link] [comments]  ( 1 min )
    How to use RL for combinatorial optimization
    I am trying to use reinforcement for combinatorial optimization. I coded up a toy example for the travelling salesman but I am confused by something. In stable baselines 3 there is a variable "done". If I set it to optimum value (that is the length of the shortest route through all the cities computer by a different method) the RL algorithm never finds it. What should done be set to? Or more generally how do you do optimization of this sort using RL. I can just set it so that it is done after 1000 steps but then it is spending a lot of time finding the best way to take the first 1000 steps which isn't quite the point. submitted by /u/wiggyhat [link] [comments]  ( 2 min )
    Can agents act simultaneously with no notion of turn-taking?
    I was reading this paper and in section 3 they claim that agents act simultaneously and there is no notion of turn-taking: https://arxiv.org/abs/2104.07750 I was wondering how this works. What I'm used to seeing is a for loop in which, one after the other, agents execute the step function and interact with the environment. How does this change if all agents act simultaneously? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    How to mitigate catastrophic forgetting for an intelligent agent using replay memory and Deep Reinforcement Learning algorithm
    Hello everyone. I built a use case to instruct an agent, which simulates a herd of predators, to encircle a prey. Episodes are non-terminal and have a fixed number of steps. The agent uses a neural network to decide which actions to take at each step and is instructed by a remember/replay mechanism using a memory of past events. I started the experiment using absolute coordinates in the state returned by the environment and the result is displayed in the following image, where the total reward values at the end of each episode ​​are displayed. https://preview.redd.it/oscet38c0gw81.png?width=640&format=png&auto=webp&s=092fc92ee56c919bdc1851bb9f1ef9bdc585bfde In this case the results were encouraging. However, I wanted to create a more generic use case, using vectors that represent the position of the prey with respect to predators instead of absolute coordinates, in order to make the task more generic. Unfortunately, the best result I was able to obtain is the one shown in the image below where, apparently, I have a deterioration after a certain number of episodes. ​ https://preview.redd.it/yoiygwxe0gw81.png?width=640&format=png&auto=webp&s=06b7082cc0ebb80e3f5e5ce8307d75cd6451f79b It seems to me a case of catastrophic forgetting (I could be wrong) and I managed to mitigate it by increasing the memory, retaining the results of the older episodes, lowering the learning rate of the neural network and using a learning decay algorithm, but I have not succeeded to eliminate the phenomenon completely. Anyone have any advice? submitted by /u/kaeldric__ [link] [comments]  ( 2 min )
    Speeding up custom implementations.
    Hi all! I've been implementing some of the model-free algorithms in recent times. Comparing their performance to open-source libraries, however, learning takes severely longer in a computational sense. I wonder what tricks libraries such as stable-baselines3 use to increase frames per seconds and accelerate policy updates. So far, I vectorized the environment sampling & the bottleneck seems to be computing the updates for the agent. Thanks a lot! submitted by /u/Internal-Brush4929 [link] [comments]  ( 1 min )
    Any blog or video series that is available to run TD3 algorithm for path planning purpose . Trained agents deployed in hardware level
    submitted by /u/ajithvallabai [link] [comments]  ( 1 min )
    Microsoft AI Researchers Introduce PPE: A Mathematically Guaranteed Reinforcement learning (RL) Algorithm For Exogenous Noise
    Reinforcement learning (RL) is a machine learning training strategy that rewards desirable behaviors while penalizing undesirable ones. A reinforcement learning agent can perceive and comprehend its surroundings, act, and learn through trial and error in general. Although RL agents can heuristically solve some problems, such as assisting a robot in navigating to a specific location in a given environment, there is no guarantee that they will be able to handle problems in settings they have not yet encountered. The capacity of these models to recognize the robot and any obstacles in its path, but not changes in its surrounding environment that occur independently of the agent, which we refer to as exogenous noise, is critical to their success. Existing RL algorithms are not powerful enoug…  ( 2 min )
  • Open

    [D] Usage Optimized AWS GPUs, 57-63% off On-Demand Prices
    www.usage.ai ​ Usage AI bundles 3-year no-upfront RIs on AWS with guaranteed buyback -- so users get all the savings of 3-year RIs with none of the commitment. I helped engineer the product. Here to answer any questions! submitted by /u/usage-team [link] [comments]
    [P] Blog post: Learning JAX by Learning to Learn
    I recently published a new blog post, which goes over how meta-learned optimizers work and how to implement them in JAX. JAX's composable function transforms make implementing meta-learning algorithms very straightforward. If you're interested in JAX or meta learning give it a read! Blog post: https://teddykoker.com/2022/04/learning-to-learn-jax/ Code: https://github.com/teddykoker/learning-to-learn-jax Original Paper (NeurIPS '16): https://arxiv.org/abs/1606.04474 submitted by /u/tomkoker [link] [comments]  ( 1 min )
    [P] Introducing FlowMeter for network packet analysis
    We’ve released a new open source project - https://github.com/deepfence/FlowMeter - to analyze and classify packet captures using ML techniques. FlowMeter is an experimental project; we’re using it to evaluate how effectively we can train an ML model to discriminate between different types of traffic flows, e.g. normal and anomalous. You can use sample data from various sources (see the README), or gather packet captures using PacketStreamer https://github.com/deepfence/PacketStreamer or other pcap tools. More information in the README, here: https://github.com/deepfence/FlowMeter and this blogpost: https://medium.com/@siddharthsatpathy.ss/introducing-flowmeter-97e0507862b6 Hope some people find it useful; we’d welcome any feedback, thank you. submitted by /u/sidd_ss [link] [comments]  ( 1 min )
    [P] open-source python library for making machine learning demos that runs in the browser or inside a jupyter notebook/google colab, package is available on PyPI
    https://gradio.app/ is for demoing machine learning models https://reddit.com/link/ueod3x/video/75e1ozmjihw81/player Prerequisite: Python 3.7+ and that's it! Quick Start To get Gradio running with a simple "Hello, World" example, follow these three steps: Install Gradio from pip. ​ pip install gradio Run the code below as a Python script or in a Python notebook (or in a colab notebook). ​ import gradio as gr def greet(name): return "Hello " + name + "!!" demo = gr.Interface(fn=greet, inputs="text", outputs="text") if __name__ == "__main__": demo.launch() The interface below will appear automatically within the Python notebook, or pop in a browser on http://localhost:7860 if running from a script. see more in the getting started guide: https://gradio.app/getting_started/ submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 2 min )
    [P] Hot Reloading for Pandas
    Hi guys I thought you might find my project useful. It's called Reloadium and saves a lot of time during python development, especially in data science and machine learning field. More details here: https://github.com/reloadware/reloadium Hot Reloading dataframes Modifying and fixing code during debugging What do you guys think about it? submitted by /u/kwazar90 [link] [comments]  ( 1 min )
    [D] SHAP method to get feature importance: is linearity realistic?
    The SHAP method assumes linear relationship between the feature effects (see definition "Additive feature attribution methods" in the paper). But is this assumption realistic? submitted by /u/savoga [link] [comments]  ( 1 min )
    [D] Need to find a good self-hosted medical image annotation tool.
    Hi. I'm trying to come up with a solution for a medical image annotation system for our laboratory. We need it to be self-hosted (in order to only work in the uni's internal network), and open source (since the funds are limited). So far I've only found out about https://lab.vindr.ai/dashboard/projects but the documentation is really bad and I could not launch it using docker compose. I've also found MONAILabel(https://github.com/Project-MONAI/MONAILabel), but it apparently requires GPU which makes it really expensive. I'd rather find a cpu based solution because our task is not that complex. We only get some Dicom files (each have studies in them), and want to label them. submitted by /u/feryet [link] [comments]  ( 1 min )
    Machine Learning in Fantasy Premier League? [D]
    https://medium.com/@subin.sen7/our-client-is-the-worlds-best-fpl-player-this-is-not-clickbait-51481fae76c9 Is it possible to use models to beat the market in Fantasy Football? submitted by /u/AI_FPL [link] [comments]
    [P] XGboost, sklearn and others running over encrypted data
    Hello everyone! Following this post numpy in fhe we are releasing a new lib that allows popular machine learning frameworks to run over encrypted data: https://github.com/zama-ai/concrete-ml Currently this supports xgboost and many sklearn models. We also support pytorch to some extent. We are trying to closely follow sklearn API (when relevant) to make the use easy to machine learning practitioners. Happy to hear any feedback on this ! submitted by /u/strojax [link] [comments]  ( 1 min )
    [D] Is Tensorflow.js still much slower than Tensorflow
    Last time I used Tensorflow was 3 years ago and it was a resource hog as well as very slow. I intend to get back into machine learning and wondering if I should give Tensorflow.js a go again using the Nodejs backend since this will be an electron app. Or should I just go straight for Tensorflow. Ideally I will need about 15 fps while taking up minimal resources submitted by /u/manrayboy [link] [comments]  ( 1 min )
    [D] Collaboration-first machine learning platform that enables you to build, train, track, and share your ML projects simply with a few lines of code
    Hey r/MachineLearning, I'm Derrick from Layer (layer.ai) - the collaboration-first machine learning platform that enables you to build, train, track, and share your ML projects simply with a few lines of code. We are soft-launching today! I’ve been working on Layer for the past 2 years with an awesome team around the world. We really poured our hearts and minds into Layer and hope you will like it. Your feedback would be very appreciated! Layer Demo To get started, you can simply run our Quickstart Example! How is Layer different from other tools? Although there are plenty of ML and DS tooling products, we believe that there is still a large gap around collaboration. Many data science projects are hosted on GitHub, which, in our experience, does not provide sufficient depth and abs…  ( 4 min )
  • Open

    How to Tune Graph Neural Networks
    submitted by /u/aidev2040 [link] [comments]
  • Open

    A one-up on motion capture
    A new neural network approach captures the characteristics of a physical system’s dynamic motion from video, regardless of rendering configuration or image differences.  ( 7 min )
    Engineers use artificial intelligence to capture the complexity of breaking waves
    Their model’s predictions should help researchers improve ocean climate simulations and hone the design of offshore structures.  ( 6 min )
  • Open

    Extracting Skill-Centric State Abstractions from Value Functions
    Posted by Dhruv Shah, Intern, and Brian Ichter, Research Scientist, Robotics at Google Advances in reinforcement learning (RL) for robotics have enabled robotic agents to perform increasingly complex tasks in challenging environments. Recent results show that robots can learn to fold clothes, dexterously manipulate a rubik’s cube, sort objects by color, navigate complex environments and walk on difficult, uneven terrain. But "short-horizon" tasks such as these, which require very little long-term planning and provide immediate failure feedback, are relatively easy to train compared to many tasks that may confront a robot in a real-world setting. Unfortunately, scaling such short-horizon skills to the abstract, long horizons of real-world tasks is difficult. For example, how would one trai…  ( 7 min )
  • Open

    How to get AI to confuse a shark with a clam
    "The Megalodon was a large bivalve, measuring up to 2.5 meters in length. Its shell was covered in spines, and it had a large, powerful jaw for crushing prey." Although the megalodon is the most widely known as a giant prehistoric shark, I recently learned that Megalodon  ( 4 min )
    Bonus: GPT-3 answers my questions about sawflies, badly
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    ‘I Doubt, Therefore I Am,’ Said AI
    It’s Monday morning, and Paul opens one of several emails sent by his boss, Heather. Her email seems a bit unusual, asking Paul to rush to… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    Designing Societally Beneficial Reinforcement Learning Systems
    Deep reinforcement learning (DRL) is transitioning from a research field focused on game playing to a technology with real-world applications. Notable examples include DeepMind’s work on controlling a nuclear reactor or on improving Youtube video compression, or Tesla attempting to use a method inspired by MuZero for autonomous vehicle behavior planning. But the exciting potential for real world applications of RL should also come with a healthy dose of caution - for example RL policies are well known to be vulnerable to exploitation, and methods for safe and robust policy development are an active area of research. At the same time as the emergence of powerful RL systems in the real world, the public and researchers are expressing an increased appetite for fair, aligned, and safe machine …  ( 8 min )

  • Open

    Decision Transformers with Hugging Face
    submitted by /u/hellopaperspace [link] [comments]
    How do you get a global observation?
    Naive question: how do you pass a global observation of the environment to your actor and critic? In other words, is a global observation always available or does it depend on the environment? Can you give me a couple of examples? Thanks :) submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Why do Monte Carlo methods have low bias compared to TD methods in RL. Does the bias term in RL has same meaning as in general in ML terminology?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 1 min )
    2nd Neural MMO challenge is out! Design your policy to master PvE and PvP challenge.
    submitted by /u/xiaolongzhu [link] [comments]  ( 1 min )
    Parallel environments affect Q learning performance
    Parallel environments can make the learning much more efficient, but sometimes using parallel environments seemed affect the performance. Does anyone know why and how to solve? submitted by /u/Traditional-Ad4492 [link] [comments]  ( 1 min )
    Papers for Green Vehicle Routing Problem
    My current work requires me to work with GVRP. The specific problem is for an EV (agent) to find the optimal route to deliver items based on charge consumption and not discharging. I'm currently modelling it as a graph datastructure although it could be different. Are there any papers (both RL and exact method based) for state of the art that I can look into regarding this. I'm also interested in earlier papers that can help me build understanding, and I can increase complexity slowly. Thanks for the help! submitted by /u/evilBotman [link] [comments]  ( 1 min )
    What is the current SOTA for single-threaded continuous-action control using RL?
    As above. I am interested in RL for robotics, specifically for legged locomotion. I wish to explore RL training on the real robot. Sample efficiency is paramount. Has any progress been made by utilizing, say, RNNs/LSTMs or even Attention ? submitted by /u/pakodanomics [link] [comments]  ( 1 min )
    Is it possible/useful to allow the agent to influence the observations in later timesteps?
    In my current environment, the agent is fed observations from a particular mathematical computation based upon the underlying state. This mathematical computation has hyper-parameters that influence the resulting observation given to the agent. Does it make any sense whatsoever to give these hyper-parameters to the agent within the action space? In this way the agent would be able to adjust its future observations and perhaps it would be able to use this in a smart way. On the other hand, I've heard that changing the environment from under the agents feed during training can lead to major issues with training. One can imagine that learning a dynamic environment is harder than a static one. Any thoughts? submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing RL PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    Could you recommend a paper on self-supervised RL related to catastrophic forgetting?
    Hi, I have problems the agent forgets good states during self-supervised pre-training. It shows good exploration, but as time goes by, only the edge case is explored, showing poor performance at finetune. I found a paper about that before, but It is really hard to find again. It related to reward. Could you recommend a paper related to this? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [D] Any independent researchers ever get published/into conferences?
    For personal context: I worked over time during my bachelors and took grad classes/started research, but financial situation took a hit for the worse and we'll let's say I don't have enough money for a Masters/continue my studies. I'm job hunting now but in a chicken/egg problem most of these jobs want at the very least research (which I know how to conduct) for which they have the resources for. Either way I have a couple research topics I want to explore, but limited resources and want to be realistic here. main question: do you know of anyone who has done it as a non-grad/non-institutional way without the help of established figures in the field. I only know of those that have done it in the way I describe above (either under an established researcher's team, a university, or a company)? Add-on: would appreciate any personal experiences as well, and if anyone has experience with conferences/how long it takes, etc. My research experience has been largely under NDAs so I haven't experienced formal publishing yet (but want to!) submitted by /u/robml [link] [comments]  ( 2 min )
    [D] feature embeddings extraction for image classification
    Is there any rule of thumb to decide from which layer extract feature embeddings for classification tasks based on knn? Does it depend by the architecture ?(conv net , vs vit models, ecc) Should I extract before or after activation function? Edit: Which model do you suggest to start from ? Actually i'm using a resnet50 trained with DINO submitted by /u/Rich_Freedom98 [link] [comments]  ( 1 min )
    [P] Terraform Provider Iterative (TPI) - plugin for ML/AI workloads to spot instance recovery & auto-termination on AWS, GCP, Azure, Kubernetes
    Terraform Provider Iterative (TPI) address the specific needs of machine learning teams - it is an open-source tool extending the functionality of Terraform, the world's most widely used multi-cloud provisioning product. The tool enables full lifecycle management of computing resources and is designed specifically for machine learning pipelines: Terraform plugin for machine learning workloads: spot instance recovery & auto-termination | AWS, GCP, Azure, Kubernetes The tool aims to bridge the gap between devops and data science teams and build on top of Terraform, a tool universally familiar to devops teams, but extend it to suit machine learning needs. It provides to following advantages for your ML workflow: Lower cost: use your preferred cloud provider's existing pricing, including on-demand per-second billing and bulk discounts. Auto-recovery: spot/preemptible instances are cheap but unreliable. TPI reliably and automatically respawns such interrupted instances, caching & restoring the working directory in the cloud even when you are offline. Custom spec: full control over hardware & software requirements via a single config file. submitted by /u/cmstrump [link] [comments]  ( 1 min )
    [R] Flamingo: a Visual Language Model for Few-Shot Learning (from DeepMind)
    Paper (pdf). A link to the paper is also in this blog post. Abstract: Building models that can be rapidly adapted to numerous tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. Flamingo models include key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of the proposed Flamingo models, exploring and measuring their ability to rapidly adapt to a variety of image and video understanding benchmarks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer, captioning tasks, which evaluate the ability to describe a scene or an event, and close-ended tasks such as multiple choice visual question-answering. For tasks lying anywhere on this spectrum, we demonstrate that a single Flamingo model can achieve a new state of the art for few-shot learning, simply by prompting the model with task-specific examples. On many of these benchmarks, Flamingo actually surpasses the performance of models that are fine-tuned on thousands of times more task-specific data. submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    [D] Self-Organizing Maps and Principal Component Analysis.
    Hello all, I am doing my PhD on numerical modelling of solar radiation and I am researching SOMs as a possible tool. I have this problem where I can't decide on a network size, so I have been trying to develop some means to compare SOM neurons from different sized networks. I have read that the 2-d SOM array tends to spam the 2 highest variance principal components. So would it be adecuate to tag my SOM nodes with the first two proyections of PCA? So that I can compare results from different experiments. submitted by /u/juliancanellas [link] [comments]  ( 1 min )
    "[Discussion]", Same MAE Result
    Hello guys, I'm wondering if you can help: we use machine learning models to identify trading opportunities using historical trading data of price and market indicators. The data we use is fine: abundant, accurate and (manually) proven to work over several non-sequential months in the past. The problem is that we keep on getting the exact same MAE result. Any suggestions on solving this issue would be immensely appreciated. submitted by /u/AFAC1410 [link] [comments]  ( 1 min )
    [P] New blog post: how to automatically find label errors in your audio datasets!
    Hi folks, our blog post on how to automatically find label errors in audio datasets has just gone live. We cover the steps to: ⛏️ Perform feature extraction (aka embeddings) on the Spoken Digit dataset with a pre-trained PyTorch model. 🔢 Use cross-validation to generate out-of-sample predicted probabilities for every example in the dataset. 🏷️ Run one line of cleanlab code on these predicted probabilities to identify which audio clips may be mislabeled. 📰 Blog Post + Google Colab: https://cleanlab.ai/blog/label-errors-audio-datasets/ https://preview.redd.it/vpzhwg6jebw81.png?width=1260&format=png&auto=webp&s=3fa79d8a097ae5936e5e6a51e87508adf6190835 submitted by /u/weijinglok [link] [comments]  ( 1 min )
    Shazam App for your brain? [R]
    https://arxiv.org/abs/2202.03265?context=eess Computer vision approach to processing naturalistic brain responses to music. Models achieved state of the art and they used publicly available datasets. With 1 sec of brain signal they can classify the name of the song you're listening to and how much you enjoyed it ~89%. Imagine this but on your airpods, seems wild. submitted by /u/blackliquerish [link] [comments]
    [D] What roles were you able to receive as a PhD with no top tier research?
    It’s looking more and more like I won’t be able to get any top tier publications in my PhD (NeurIPS, ICLR, etc.). I’ve tried, but various factors have limited the experience. I’m curious what people were able to do in this circumstance. I’ll have 3-5 publications in mid-tier venues, plus many years of experience as a software engineer and machine learning engineer (in non-tech companies). People in my program have gone on to work at Amazon, and a few Google, but these were all applied scientist and engineering roles. Does anyone go on to work as a research scientist? National labs seem like a good bet, I’m curious what others have done. submitted by /u/walterkronkite33 [link] [comments]  ( 5 min )
    [D] Have we stopped researching agents?
    It seems to me that the ML research community has stopped talking about agents? As in, machines that act in complex environments by themselves? Ever since GPT-3 came out, bascially every single paper has been related to NLP or Transformers in general. It seems a natural next step would be to try to get agents we could tell what to do and how to do it in natural language, no? Yet that has not been the focus at all. The last time we had wide spread focus on planning and acting was the Starcraft challenge, but that was "solved" so quickly nothing much came of it. Research I know about: - Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents - Tesla in general I guess, with their cars and maybe robots Is it just me or is there just no interest in researching more capable agents right now? submitted by /u/ReasonablyBadass [link] [comments]  ( 2 min )
    [D] Advertisement Allocation models being used?
    There are a bunch of papers on different advertisement allocation techniques (placing ads to valid ad spots) and in algorithms in grad school I learned MaxFlow/MinCut, but I'm wondering what is actually being used in production systems? Linear models, genetic algorithms, reinforcement learning, some neural network? I know that very little of this is published/talked about due to being proprietary. However, I'm wondering because so much gets published, but the more advanced algorithms tend to be complex and not necessarily possible to be deployed in production systems (what I suspect)? submitted by /u/Flipper3 [link] [comments]  ( 2 min )
    [D] ML model dev pipelines
    Hi Folks! It'd really help if you can participate in this poll and share about your ML/DL model workflow before moving it to production. Out of these steps, which one is your most preferred way of integrating ML into your app or business (not specific to any domain): View Poll submitted by /u/fgp121 [link] [comments]  ( 1 min )
    [R] RL papers relating to Green Vehicle Routing Problem
    My current work requires me to work with GVRP. The specific problem is for an EV (agent) to find the optimal route to deliver items based on charge consumption and not discharging. I'm currently modelling it as a graph datastructure although it could be different. Are there any papers (both RL and exact method based) for state of the art that I can look into regarding this. I'm also interested in earlier papers that can help me build understanding, and I can increase complexity slowly. Thanks for the help! submitted by /u/evilBotman [link] [comments]  ( 1 min )
    [D] What is the recommended way to estimate norm for gradient clipping?
    I have read several blogs in which they specified that you should clip your gradients to the largest value that doesn't cause exploding gradients. But that also means that you could get a situation that you don't need to do gradient clipping at all. Could that hurt convergence speed in some way? Is there any guideline for how to pick this hyperparameter based on learning rate, batch size, model size, input size (seq len, img size) etc.? Or is this project/use case dependent and it represents "just another hyperparameter"? submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [P] HuSpaCy: Industrial-strength Hungarian NLP
    I'd like to show off a Hungarian NLP pipeline which we've been heavily improving over the past year. https://github.com/huspacy/huspacy While processing Hungarian texts might not be interesting for the most of you, I believe this project can be a good learning resource, as our models utilize the latest NLP technologies from Explosion and are fully reproducible: We've created a transformers-based model on top of a language specific BERT model Multi-task and transfer-learning is heavily used across the models. Incorporated edit-tree lemmatization and biaffine parsing from spacy-experimental We provide word embeddings using the (fastText-like) floret tool submitted by /u/oroszgy [link] [comments]  ( 1 min )
    [D] How do you decide the ranges for hyperparameters when doing a grid search or a random search?
    I'm trying to tune the parameters of two models - a random forest classifier, and a gradient boosting classifier. When using a grid search or a random search (I might also play around with the genetic algorithm), what are appropriate ranges to use / how can I come to know them? I obviously can't just specify a very arbitrary range because that might decrease the effectiveness of the random search, and make the grid search too long. Do they depend on the number of features in the dataset or other characteristics (e.g. our dataset is quite imbalanced and we're using SMOTE resampling)? I'm just trying to tune n_estimators and max_depth, but RandomForestClassifier also has many other parameters, do I just experiment with all of them, or are there any known parameters that don't do anything useful unless I'm trying for an extra 0.1% of accuracy or recall? Basically the same questions as above for the GradientBoostingClassifier. Letting go of specific models for a second, I'd appreciate generic pointers on how to deliberately tune hyperparameters or specify ranges instead of just arbitrarily specifying sth and hoping it works. Thanks! submitted by /u/stuffingmybrain [link] [comments]  ( 2 min )
    How to do meaningful work as an independent researcher? [Discussion]
    With big players like OpenAI and Google building these massive models, how does independent researchers without access to such scale and compute do meaningful work? Came across tweets from researchers, especially ones working on generative models saying they feel their work looks irrelevant after seeing results from DALL-E 2. It feels like just a couple of years ago if you had a decent GPU setup, you could pretty much do world class research. Doesn't look like it anymore. Is there, if any, research directions that makes it a level playing field where compute and scale is not necessarily the solution, or are we all doomed to be prompt engineers for GPT models? submitted by /u/HairyIndianDude [link] [comments]  ( 4 min )
  • Open

    How long before dall-e 2 (or similarly capable) produces porn ?
    Dall-e 2 has been released a few weaks and is exceptionnally capable for generating images from a text description. However training data containing porn has been filtered out as well as requests that contains nsfw terms. So is it not suited for generated porn for now. However, how long would it takes before : Someone does something as capable and allows porn OpenAI opens dall-e 2 to such content (edit)This is an open discussion.But there is clear evidence that the technology now exists to create on demand and instantly any kind of porn with any kind of kink in a lot of styles including photorealistic style. submitted by /u/exotic_deviantcy [link] [comments]
    Augmented Military Soldiers
    We often believe that future soldiers are going to be robots. There will no longer be human "boots on the ground" thanks to artificial intelligence, drones, etc. Many of us fail to realize that current soldiers, in militaries across the globe, are being augmented with AI as we speak. Things like AR headsets, computerized sights, and other technologies are going to turn them into cyborgs...Another one of the intriguing aspects of this is that major companies like Microsoft are creating these technologies for the military. How long until we see augmented soldiers going against one another? https://www.linkedin.com/pulse/augmented-soldier-mvyl-associates-1f/?trackingId=FnvaWMtxMsfm2fJeQP1bgg%3D%3D submitted by /u/IsabeldeMontoya [link] [comments]  ( 1 min )
    A Parable Of Explainability
    submitted by /u/elcric_krej [link] [comments]
    AI Dream 18 - Visual Trip through Wonderland
    submitted by /u/LordPewPew777 [link] [comments]
    Community College AI Program Needs
    I am the instructor and lead of the AI program at a community college in North Carolina. I am developing the program currently and have money to spend on Cool Educational Robotics and potentially other tools. Does anybody have any recommendations for robots to teach python with classic search, reinforcement learning methods, or other university level algorithms? A robotic arm would be nice, but I need to tie it directly to a potential project in machine learning, deep learning, reinforcement learning, search, logic, computer vision, etc. We have a NAO robot and are getting a LIMO: https://www.robotlab.com/store/limo-agilex But we could probably use an Educational Robotic Arm Thanks! Our program is one of the first Associates programs in AI in the country. Check out our program website. It is completely available online and we accept out of state students. https://www.waynecc.edu/programs/ai/ Feel free to ask any questions as well. *Also, advice on best value cloud computing virtualized resources for high performance computing for deep learning projects is appreciated. We are currently planning on going with Microsoft's Data Science Virtual Machines but not sure which series exactly submitted by /u/Wayne_CC_AI [link] [comments]  ( 1 min )
    PeopleLens AI Helps The Blind | Brain Fingerprints Detect Autism | AI Predicts Cancer Tumor Regrowth
    submitted by /u/getrich_or_diemining [link] [comments]
    Uhhhh
    submitted by /u/Blazeolmo [link] [comments]
    Dall-E 2 access
    Hello everyone, I wanted to know if one could get access to the dall-e 2 app without having to hope to be chosen from the wait list. Thanks! submitted by /u/Swaggyswaggerson [link] [comments]  ( 1 min )
    A brief history of deepfakes - how it started and where it might take us
    submitted by /u/much_successes [link] [comments]
    𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐨𝐟 𝐓𝐡𝐢𝐧𝐠𝐬 (𝐀𝐈𝐨𝐓) - 𝐋𝐚𝐭𝐞𝐬𝐭 𝐓𝐞𝐜𝐡𝐧𝐨𝐥𝐨𝐠𝐲, 𝐅𝐮𝐭𝐮𝐫𝐞 𝐔𝐩𝐜𝐨𝐦𝐢𝐧𝐠 𝐓𝐫𝐞𝐧𝐝𝐬 𝐚𝐧𝐝 𝐅𝐨𝐫𝐞𝐜𝐚𝐬𝐭 𝐭𝐨 𝟐𝟎𝟐𝟖
    Our Latest research report on the Artificial Intelligence of Things (AIoT) market shows how that market is changing and how trends in demographics, business cycles, and microeconomics affect the Artificial Intelligence of Things (AIoT) market as a whole. Our study of the global Artificial Intelligence of Things (AIoT) market demonstrates what's happening with business state by looking at production value and key regions. The market report provides an entire analysis of sales volume, pricing analysis, revenue, the margin of profit, the expansion rate within the Artificial Intelligence of Things (AIoT) market. Get A Free Sample Report @ https://www.intelligencemarketreport.com/report-sample/573982 ​ \"𝐀𝐫𝐭𝐢𝐟𝐢𝐜𝐢𝐚𝐥 𝐈𝐧𝐭𝐞𝐥𝐥𝐢𝐠𝐞𝐧𝐜𝐞 𝐨𝐟 𝐓𝐡𝐢𝐧𝐠𝐬 \" Key Information Extracted from the Report ​ Extensive information on factors estimated to affect the Market growth and market share during the forecast period is presented in the report. The report offers the present scenario and future growth prospects Market in various geographical regions. The competitive landscape analysis on the market as well as the qualitative and quantitative information is delivered. The SWOT analysis is conducted along with Porter's Five Force analysis. · The in-depth analysis provides an insight into the Market, underlining the growth rate and opportunities offered in the business. Leading Key Players Included In This Report Are: · Twilio Inc. · ShiftPixy Inc. · Micron Technology · Intel · IBM · Gopher Protocol · Deep Vision · Ceva · ALCES · AISPEECH submitted by /u/Purva_Duggal [link] [comments]
    Free Webinar on Automated CV pipelines | Video Predictions
    Automated CV Pipelines 4th part is open for registration. It will be covering some of the best practices for video-specific annotation tasks. If you are interested you can check out the details here! submitted by /u/WeekendClassic [link] [comments]
    Stairway to (A.I. animation + sound design)
    submitted by /u/nenomancer [link] [comments]  ( 1 min )
    Ancient civs always burn (A.I. animation + some sound design)
    submitted by /u/nenomancer [link] [comments]
    Treasure Planet made with starryai
    submitted by /u/Losthel [link] [comments]
    Elon Musk's NEURALINK vs Bryan Johnson's KERNEL (No Surgery)
    submitted by /u/1024cities [link] [comments]  ( 1 min )
    Sentiment, cognitive distortions and market data time series - causal analysis for crypto markets
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    High Tech Hacks 2022 ! !
    Hey guys! I’m excited to share with you an exciting upcoming hackathon, High Tech Hacks 2.0! High Tech Hacks is a free, international 24-hour hackathon on May 21-22nd, 2022 open to all high schoolers hoping to learn a new coding skill, compete for awesome prizes, or work with other like-minded hackers. Let’s invent, create, and push the boundaries of technology (as much as we can at one hackathon)! What to expect: Last year, participants learned the basics of web development, Python, virtual reality, and how to make a Discord bot from current software engineers at Microsoft, Amazon, Twilio, other tech companies, and Columbia University SHPE. Thanks to our company sponsors, each participant last year received nearly $400 worth of free software and swag. Register to earn FREE swag (t-shirts, water bottles, stickers!) Network with other passionate STEM high school students from around the world! (Last year we had participants from 26 countries signed up already!) This year we have even bigger prizes, competitions, and speakers so stay tuned! Reach out to me with more questions or email [hightechhackathon@gmail.com](mailto:hightechhackathon@gmail.com). Happy hacking! :D Sign up here to confirm your interest and get on our mailing list: Click Here to Register! Also, meet other hackers by Joining our Discord! For more, Check out our Website submitted by /u/HighTechHacks [link] [comments]  ( 1 min )
    MyStyle: The Best AI Face Manipulation to Date!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
  • Open

    Amazon Rekognition introduces Streaming Video Events to provide real-time alerts on live video streams
    Today, AWS announced the general availability of Amazon Rekognition Streaming Video Events, a fully managed service for camera manufacturers and service providers that uses machine learning (ML) to detect objects such as people, pets, and packages in live video streams from connected cameras. Amazon Rekognition Streaming Video Events sends them a notification as soon as […]  ( 8 min )
    3xLOGIC uses Amazon Rekognition Streaming Video Events to provide intelligent video analytics on live video streams to monitoring agents
    3xLOGIC is a leader in commercial electronic security systems. They provide commercial security systems and managed video monitoring for businesses, hospitals, schools, and government agencies. Managed video monitoring is a critical component of a comprehensive security strategy for 3xLOGIC’s customers. With more than 50,000 active cameras in the field, video monitoring teams face a daily […]  ( 4 min )
    Abode uses Amazon Rekognition Streaming Video Events to provide real-time notifications to their smart home customers
    Abode Systems (Abode) offers homeowners a comprehensive suite of do-it-yourself home security solutions that can be set up in minutes and enables homeowners to keep their family and property safe. Since the company’s launch in 2015, in-camera motion detection sensors have played an essential part in Abode’s solution, enabling customers to receive notifications and monitor […]  ( 6 min )
    Pandas user-defined functions are now available in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler reduces the time to aggregate and prepare data for machine learning (ML) from weeks to minutes. With Data Wrangler, you can select and query data with just a few clicks, quickly transform data with over 300 built-in data transformations, and understand your data with built-in visualizations without writing any code. Additionally, […]  ( 4 min )
    How Searchmetrics uses Amazon SageMaker to automatically find relevant keywords and make their human analysts 20% faster
    Searchmetrics is a global provider of search data, software, and consulting solutions, helping customers turn search data into unique business insights. To date, Searchmetrics has helped more than 1,000 companies such as McKinsey & Company, Lowe’s, and AXA find an advantage in the hyper-competitive search landscape. In 2021, Searchmetrics turned to AWS to help with […]  ( 5 min )
    Identify paraphrased text with Hugging Face on Amazon SageMaker
    Identifying paraphrased text has business value in many use cases. For example, by identifying sentence paraphrases, a text summarization system could remove redundant information. Another application is to identify plagiarized documents. In this post, we fine-tune a Hugging Face transformer on Amazon SageMaker to identify paraphrased sentence pairs in a few steps. A truly robust […]  ( 10 min )
    How Moovit turns data into insights to help passengers avoid delays using Apache Airflow and Amazon SageMaker
    This is a guest post by Moovit’s Software and Cloud Architect, Sharon Dahan. Moovit, an Intel company, is a leading Mobility as a Service (MaaS) solutions provider and creator of the top urban mobility app. Moovit serves over 1.3 billion riders in 3,500 cities around the world. We help people everywhere get to their destination […]  ( 7 min )
  • Open

    How DNEG Helped Win Another Visual-Effects Oscar by Bringing ‘Dune’ to Life With NVIDIA RTX
    Featuring stunning visuals from futuristic interstellar worlds, including colossal sand creatures, Dune captivated audiences around the world. The sci-fi film picked up six Oscars last month at the 94th Academy Awards, including for Best Sound and Visual Effects. Adapted from Frank Herbert’s 1965 novel of the same name, Dune tells the story of Paul Atreides, Read article > The post How DNEG Helped Win Another Visual-Effects Oscar by Bringing ‘Dune’ to Life With NVIDIA RTX appeared first on NVIDIA Blog.  ( 4 min )
    Your Odyssey Awaits: Stream ‘Lost Ark’ to Nearly Any Device This GFN Thursday
    It’s a jam-packed GFN Thursday. This week brings the popular, free-to-play, action role-playing game Lost Ark to gamers across nearly all their devices, streaming on GeForce NOW. And that’s not all. GFN Thursday also delivers an upgraded experience in the 2.0.40 update. M1-based MacBooks, iMacs and Mac Minis are now supported natively. Plus, membership gift Read article > The post Your Odyssey Awaits: Stream ‘Lost Ark’ to Nearly Any Device This GFN Thursday appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Top 11 Retail Technology Trends For 2022
    The 2019 pandemic became a catalyst for retail businesses and their customers. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 6 min )
  • Open

    How can we reduce the carbon footprint of global computing?
    Workshop hosted by MIT’s Climate and Sustainability Consortium, MIT-IBM Watson AI Lab, and the MIT Schwarzman College of Computing highlights how new approaches to computing can save energy and help the planet.  ( 8 min )
    Aging Brain Initiative awards fund five new ideas to study, fight neurodegeneration
    Competitive seed grants launch yearlong investigations of novel hypotheses about potential causes, biomarkers, treatments of Alzheimer’s and ALS.  ( 6 min )
  • Open

    Help to create a simple Neural Network... or find the bug
    Hello, I really need help creating a simple Neural Network that approximates the Rosenbrock function (a function with two variables). I have used the book "MATLAB Deep Learning" by Phil Kim, which provides code examples. However, when I make a simple example where the network is trained with backpropagation the output of the testing points simply comes out as ones. It is very early work so the code is very simple. I have just cut out the piece of code with Neural Network. Can someone give any tips or help me to figure out what is wrong? %% Neural Network % Rosenbrock function : f = (1 - x1).^2 + 100*(x2-x1.^2).^2 % X: 25x2 matrix % f_samp: 25x1 vector % Number of nodes in hidden layer nHNodes = 4 ; % Input layer has two nodes % Output layer has one node % Initial weigts W1 = 2*rand(nHNodes, k) - 1 ; W2 = 2*rand(1, nHNodes) - 1 ; alpha = 0.5 ; % Lerning rate % Backpropagation for i = 1:10000 for i = 1:size(X,1) x = X(i, :)'; d = f_samp(i); % Forward v1 = W1*x; y1 = fun_sigmoid(v1); v = W2*y1; y = fun_sigmoid(v); % Backward e = d - y; delta = y.*(1-y).*e; e1 = W2'*delta; delta1 = y1.*(1-y1).*e1; dW1 = alpha*delta1*x'; W1 = W1 + dW1; dW2 = alpha*delta*y1'; W2 = W2 + dW2; end end % Testing sampling points for i = 1:size(X,1) x = X(i, :)' ; v1 = W1*x ; y1 = fun_sigmoid(v1) ; v = W2*y1 ; y(i) = fun_sigmoid(v) ; end submitted by /u/TobiasFred [link] [comments]  ( 1 min )
    I need help making a neural network from scratch in python
    Here's my code: https://github.com/heyuhowudoin/mnist_ai I currently have 3 layers, 15, 15 and 10 neurons respectively. I'm using the MNIST database to get images of hand-drawn numbers and trying to classify them by which neuron in the final layer has the highest activation. I use a combination of sigmoid and ReLU as activation functions, and I am using minibatches of 100 pictures each. The thing I see happening is that it converges to a point where every neuron in the final layer outputs a 0 because then it has a relatively low error since the target for 9 out of 10 of those neurons is indeed 0, whereas the target for the correct output neuron is 1. I've tried looking over my backprop algorithm, I've tried changing the neurons in each hidden layer to 200, I've changed the learning speed a bunch but I can't seem to figure out why it's not working so if one of you guys wouldn't mind helping me, that would be much appreciated. submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
    AlphaGo Zero
    Hello, I have an alphago zero question. Why doesn’t alphago zero use Q(s,a) to choose its next move in the Monte Carlo tree search? Why does it use the π instead? submitted by /u/Skinnybisquit [link] [comments]
  • Open

    Fast Aquatic Swimmer Optimization with Differentiable Projective Dynamics and Neural Network Hydrodynamic Models. (arXiv:2204.12584v1 [cs.RO])
    Aquatic locomotion is a classic fluid-structure interaction (FSI) problem of interest to biologists and engineers. Solving the fully coupled FSI equations for incompressible Navier-Stokes and finite elasticity is computationally expensive. Optimizing robotic swimmer design within such a system generally involves cumbersome, gradient-free procedures on top of the already costly simulation. To address this challenge we present a novel, fully differentiable hybrid approach to FSI that combines a 2D direct numerical simulation for the deformable solid structure of the swimmer and a physics-constrained neural network surrogate to capture hydrodynamic effects of the fluid. For the deformable simulation of the swimmer's body, we use state-of-the-art techniques from the field of computer graphics to speed up the finite-element method (FEM). For the fluid simulation, we use a U-Net architecture trained with a physics-based loss function to predict the flow field at each time step. The pressure and velocity field outputs from the neural network are sampled around the boundary of our swimmer using an immersed boundary method (IBM) to compute its swimming motion accurately and efficiently. We demonstrate the computational efficiency and differentiability of our hybrid simulator on a 2D carangiform swimmer. Since both the solid simulator and the hydrodynamics model are automatically differentiable, we obtain a fully differentiable FSI simulator that can be used for computational co-design of geometry and controls for rigid and soft bodies immersed in fluids, such as minimizing drag, maximizing speed, or maximizing efficiency via direct gradient-based optimization.  ( 2 min )
    Zero-Touch Network on Industrial IoT: An End-to-End Machine Learning Approach. (arXiv:2204.12605v1 [cs.LG])
    Industry 4.0-enabled smart factory is expected to realize the next revolution for manufacturers. Although artificial intelligence (AI) technologies have improved productivity, current use cases belong to small-scale and single-task operations. To unbound the potential of smart factory, this paper develops zero-touch network systems for intelligent manufacturing and facilitates distributed AI applications in both training and inferring stages in a large-scale manner. The open radio access network (O-RAN) architecture is first introduced for the zero-touch platform to enable globally controlling communications and computation infrastructure capability in the field. The designed serverless framework allows intelligent and efficient learning assignments and resource allocations. Hence, requested learning tasks can be assigned to appropriate robots, and the underlying infrastructure can be used to support the learning tasks without expert knowledge. Moreover, due to the proposed network system's flexibility, powerful AI-enabled networking algorithms can be utilized to ensure service-level agreements and superior performances for factory workloads. Finally, three open research directions of backward compatibility, end-to-end enhancements, and cybersecurity are discussed for zero-touch smart factory.  ( 2 min )
    Data-driven detector signal characterization with constrained bottleneck autoencoders. (arXiv:2203.04604v4 [physics.ins-det] UPDATED)
    A common technique in high energy physics is to characterize the response of a detector by means of models tunned to data which build parametric maps from the physical parameters of the system to the expected signal of the detector. When the underlying model is unknown it is difficult to apply this method, and often, simplifying assumptions are made introducing modeling errors. In this article, using a waveform toy model we present how deep learning in the form of constrained bottleneck autoencoders can be used to learn the underlying unknown detector response model directly from data. The results show that excellent performance results can be achieved even when the signals are significantly affected by random noise. The trained algorithm can be used simultaneously to perform estimations on the physical parameters of the model, simulate the detector response with high fidelity and to denoise detector signals.  ( 2 min )
    A Survey on Machine Learning Approaches for Modelling Intuitive Physics. (arXiv:2202.06481v2 [cs.LG] UPDATED)
    Research in cognitive science has provided extensive evidence of human cognitive ability in performing physical reasoning of objects from noisy perceptual inputs. Such a cognitive ability is commonly known as intuitive physics. With advancements in deep learning, there is an increasing interest in building intelligent systems that are capable of performing physical reasoning from a given scene for the purpose of building better AI systems. As a result, many contemporary approaches in modelling intuitive physics for machine cognition have been inspired by literature from cognitive science. Despite the wide range of work in physical reasoning for machine cognition, there is a scarcity of reviews that organize and group these deep learning approaches. Especially at the intersection of intuitive physics and artificial intelligence, there is a need to make sense of the diverse range of ideas and approaches. Therefore, this paper presents a comprehensive survey of recent advances and techniques in intuitive physics-inspired deep learning approaches for physical reasoning. The survey will first categorize existing deep learning approaches into three facets of physical reasoning before organizing them into three general technical approaches and propose six categorical tasks of the field. Finally, we highlight the challenges of the current field and present some future research directions.  ( 2 min )
    Generating Examples From CLI Usage: Can Transformers Help?. (arXiv:2204.12648v1 [cs.SE])
    Continuous evolution in modern software often causes documentation, tutorials, and examples to be out of sync with changing interfaces and frameworks. Relying on outdated documentation and examples can lead programs to fail or be less efficient or even less secure. In response, programmers need to regularly turn to other resources on the web such as StackOverflow for examples to guide them in writing software. We recognize that this inconvenient, error-prone, and expensive process can be improved by using machine learning applied to software usage data. In this paper, we present our practical system which uses machine learning on large-scale telemetry data and documentation corpora, generating appropriate and complex examples that can be used to improve documentation. We discuss both feature-based and transformer-based machine learning approaches and demonstrate that our system achieves 100% coverage for the used functionalities in the product, providing up-to-date examples upon every release and reduces the numbers of PRs submitted by software owners writing and editing documentation by >68%. We also share valuable lessons learnt during the 3 years that our production quality system has been deployed for Azure Cloud Command Line Interface (Azure CLI).  ( 2 min )
    Generating Self-Serendipity Preference in Recommender Systems for Addressing Cold Start Problems. (arXiv:2204.12651v1 [cs.IR])
    Classical accuracy-oriented Recommender Systems (RSs) typically face the cold-start problem and the filter-bubble problem when users suffer the familiar, repeated, and even predictable recommendations, making them boring and unsatisfied. To address the above issues, serendipity-oriented RSs are proposed to recommend appealing and valuable items significantly deviating from users' historical interactions and thus satisfying them by introducing unexplored but relevant candidate items to them. In this paper, we devise a novel serendipity-oriented recommender system (\textbf{G}enerative \textbf{S}elf-\textbf{S}erendipity \textbf{R}ecommender \textbf{S}ystem, \textbf{GS$^2$-RS}) that generates users' self-serendipity preferences to enhance the recommendation performance. Specifically, this model extracts users' interest and satisfaction preferences, generates virtual but convincible neighbors' preferences from themselves, and achieves their self-serendipity preference. Then these preferences are injected into the rating matrix as additional information for RS models. Note that GS$^2$-RS can not only tackle the cold-start problem but also provides diverse but relevant recommendations to relieve the filter-bubble problem. Extensive experiments on benchmark datasets illustrate that the proposed GS$^2$-RS model can significantly outperform the state-of-the-art baseline approaches in serendipity measures with a stable accuracy performance.  ( 2 min )
    AI-Bind: Improving Binding Predictions for Novel Protein Targets and Ligands. (arXiv:2112.13168v4 [q-bio.QM] UPDATED)
    Identifying novel drug-target interactions (DTI) is a critical and rate limiting step in drug discovery. While deep learning models have been proposed to accelerate the identification process, we show that state-of-the-art models fail to generalize to novel (i.e., never-before-seen) structures. We first unveil the mechanisms responsible for this shortcoming, demonstrating how models rely on shortcuts that leverage the topology of the protein-ligand bipartite network, rather than learning the node features. Then, we introduce AI-Bind, a pipeline that combines network-based sampling strategies with unsupervised pre-training, allowing us to limit the annotation imbalance and improve binding predictions for novel proteins and ligands. We illustrate the value of AI-Bind by predicting drugs and natural compounds with binding affinity to SARS-CoV-2 viral proteins and the associated human proteins. We also validate these predictions via auto-docking simulations and comparison with recent experimental evidence, and step up the process of interpreting machine learning prediction of protein-ligand binding by identifying potential active binding sites on the amino acid sequence. Overall, AI-Bind offers a powerful high-throughput approach to identify drug-target combinations, with the potential of becoming a powerful tool in drug discovery.  ( 2 min )
    Trusted Multi-View Classification with Dynamic Evidential Fusion. (arXiv:2204.11423v2 [cs.LG] UPDATED)
    Existing multi-view classification algorithms focus on promoting accuracy by exploiting different views, typically integrating them into common representations for follow-up tasks. Although effective, it is also crucial to ensure the reliability of both the multi-view integration and the final decision, especially for noisy, corrupted and out-of-distribution data. Dynamically assessing the trustworthiness of each view for different samples could provide reliable integration. This can be achieved through uncertainty estimation. With this in mind, we propose a novel multi-view classification algorithm, termed trusted multi-view classification (TMC), providing a new paradigm for multi-view learning by dynamically integrating different views at an evidence level. The proposed TMC can promote classification reliability by considering evidence from each view. Specifically, we introduce the variational Dirichlet to characterize the distribution of the class probabilities, parameterized with evidence from different views and integrated with the Dempster-Shafer theory. The unified learning framework induces accurate uncertainty and accordingly endows the model with both reliability and robustness against possible noise or corruption. Both theoretical and experimental results validate the effectiveness of the proposed model in accuracy, robustness and trustworthiness.  ( 2 min )
    An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate. (arXiv:2204.12554v1 [cs.LG])
    A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.  ( 2 min )
    Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models. (arXiv:2104.05158v6 [cs.DC] UPDATED)
    Deep learning recommendation models (DLRMs) are used across many business-critical services at Facebook and are the single largest AI application in terms of infrastructure demand in its data-centers. In this paper we discuss the SW/HW co-designed solution for high-performance distributed training of large-scale DLRMs. We introduce a high-performance scalable software stack based on PyTorch and pair it with the new evolution of Zion platform, namely ZionEX. We demonstrate the capability to train very large DLRMs with up to 12 Trillion parameters and show that we can attain 40X speedup in terms of time to solution over previous systems. We achieve this by (i) designing the ZionEX platform with dedicated scale-out network, provisioned with high bandwidth, optimal topology and efficient transport (ii) implementing an optimized PyTorch-based training stack supporting both model and data parallelism (iii) developing sharding algorithms capable of hierarchical partitioning of the embedding tables along row, column dimensions and load balancing them across multiple workers; (iv) adding high-performance core operators while retaining flexibility to support optimizers with fully deterministic updates (v) leveraging reduced precision communications, multi-level memory hierarchy (HBM+DDR+SSD) and pipelining. Furthermore, we develop and briefly comment on distributed data ingestion and other supporting services that are required for the robust and efficient end-to-end training in production environments.  ( 3 min )
    Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning. (arXiv:2204.08735v2 [cs.LG] UPDATED)
    Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.  ( 2 min )
    Performer: A Novel PPG to ECG Reconstruction Transformer For a Digital Biomarker of Cardiovascular Disease Detection. (arXiv:2204.11795v2 [eess.SP] UPDATED)
    Cardiovascular diseases (CVDs) have become the top one cause of death; three-quarters of these deaths occur in lower-income communities. Electrocardiography (ECG), an electrical measurement capturing the cardiac activities, is a gold-standard to diagnose CVDs. However, ECG is infeasible for continuous cardiac monitoring due to its requirement for user participation. Meanwhile, photoplethysmography (PPG) is easy to collect, but the limited accuracy constrains its clinical usage. In this research, a novel Transformer-based architecture, Performer, is invented to reconstruct ECG from PPG and to create a novel digital biomarker, PPG along with its reconstructed ECG, as multiple modalities for CVD detection. This architecture, for the first time, performs Transformer sequence to sequence translation on biomedical waveforms, while also utilizing the advantages of the easily accessible PPG and the well-studied base of ECG. Shifted Patch-based Attention (SPA) is created to maximize the signal features by fetching the various sequence lengths as hierarchical stages into the training while also capturing cross-patch connections through the shifted patch mechanism. This architecture generates a state-of-the-art performance of 0.29 RMSE for reconstructing ECG from PPG, achieving an average of 95.9% diagnosis for CVDs on the MIMIC III dataset and 75.9% for diabetes on the PPG-BP dataset. Performer, along with its novel digital biomarker, offers a low-cost and non-invasive solution for continuous cardiac monitoring, only requiring the easily extractable PPG data to reconstruct the not-as-accessible ECG data. As a prove of concept, an earring wearable, named PEARL (prototype), is designed to scale up the point-of-care (POC) healthcare system.  ( 2 min )
    An empirical study of the effect of background data size on the stability of SHapley Additive exPlanations (SHAP) for deep learning models. (arXiv:2204.11351v2 [cs.LG] UPDATED)
    Nowadays, the interpretation of why a machine learning (ML) model makes certain inferences is as crucial as the accuracy of such inferences. Some ML models like the decision tree possess inherent interpretability that can be directly comprehended by humans. Others like artificial neural networks (ANN), however, rely on external methods to uncover the deduction mechanism. SHapley Additive exPlanations (SHAP) is one of such external methods, which requires a background dataset when interpreting ANNs. Generally, a background dataset consists of instances randomly sampled from the training dataset. However, the sampling size and its effect on SHAP remain to be unexplored. In our empirical study on the MIMIC-III dataset, we show that the two core explanations - SHAP values and variable rankings fluctuate when using different background datasets acquired from random sampling, indicating that users cannot unquestioningly trust the one-shot interpretation from SHAP. Luckily, such fluctuation decreases with the increase of the background dataset size. Also, we notice an U-shape in the stability assessment of SHAP variable rankings, demonstrating that SHAP is more reliable in ranking the most and least important variables compared to moderately important ones. Overall, our results suggest that users should take into account how background data affects SHAP results, with improved SHAP stability as the background sample size increases.  ( 2 min )
    Data Debugging with Shapley Importance over End-to-End Machine Learning Pipelines. (arXiv:2204.11131v2 [cs.LG] UPDATED)
    Developing modern machine learning (ML) applications is data-centric, of which one fundamental challenge is to understand the influence of data quality to ML training -- "Which training examples are 'guilty' in making the trained ML model predictions inaccurate or unfair?" Modeling data influence for ML training has attracted intensive interest over the last decade, and one popular framework is to compute the Shapley value of each training example with respect to utilities such as validation accuracy and fairness of the trained ML model. Unfortunately, despite recent intensive interest and research, existing methods only consider a single ML model "in isolation" and do not consider an end-to-end ML pipeline that consists of data transformations, feature extractors, and ML training. We present DataScope (ease.ml/datascope), the first system that efficiently computes Shapley values of training examples over an end-to-end ML pipeline, and illustrate its applications in data debugging for ML training. To this end, we first develop a novel algorithmic framework that computes Shapley value over a specific family of ML pipelines that we call canonical pipelines: a positive relational algebra query followed by a K-nearest-neighbor (KNN) classifier. We show that, for many subfamilies of canonical pipelines, computing Shapley value is in PTIME, contrasting the exponential complexity of computing Shapley value in general. We then put this to practice -- given an sklearn pipeline, we approximate it with a canonical pipeline to use as a proxy. We conduct extensive experiments illustrating different use cases and utilities. Our results show that DataScope is up to four orders of magnitude faster over state-of-the-art Monte Carlo-based methods, while being comparably, and often even more, effective in data debugging.  ( 2 min )
    Reinforced Causal Explainer for Graph Neural Networks. (arXiv:2204.11028v2 [cs.LG] UPDATED)
    Explainability is crucial for probing graph neural networks (GNNs), answering questions like "Why the GNN model makes a certain prediction?". Feature attribution is a prevalent technique of highlighting the explanatory subgraph in the input graph, which plausibly leads the GNN model to make its prediction. Various attribution methods exploit gradient-like or attention scores as the attributions of edges, then select the salient edges with top attribution scores as the explanation. However, most of these works make an untenable assumption - the selected edges are linearly independent - thus leaving the dependencies among edges largely unexplored, especially their coalition effect. We demonstrate unambiguous drawbacks of this assumption - making the explanatory subgraph unfaithful and verbose. To address this challenge, we propose a reinforcement learning agent, Reinforced Causal Explainer (RC-Explainer). It frames the explanation task as a sequential decision process - an explanatory subgraph is successively constructed by adding a salient edge to connect the previously selected subgraph. Technically, its policy network predicts the action of edge addition, and gets a reward that quantifies the action's causal effect on the prediction. Such reward accounts for the dependency of the newly-added edge and the previously-added edges, thus reflecting whether they collaborate together and form a coalition to pursue better explanations. As such, RC-Explainer is able to generate faithful and concise explanations, and has a better generalization power to unseen graphs. When explaining different GNNs on three graph classification datasets, RC-Explainer achieves better or comparable performance to SOTA approaches w.r.t. predictive accuracy and contrastivity, and safely passes sanity checks and visual inspections. Codes are available at https://github.com/xiangwang1223/reinforced_causal_explainer.  ( 2 min )
    Long-term Spatio-temporal Forecasting via Dynamic Multiple-Graph Attention. (arXiv:2204.11008v2 [cs.LG] UPDATED)
    Many real-world ubiquitous applications, such as parking recommendations and air pollution monitoring, benefit significantly from accurate long-term spatio-temporal forecasting (LSTF). LSTF makes use of long-term dependency between spatial and temporal domains, contextual information, and inherent pattern in the data. Recent studies have revealed the potential of multi-graph neural networks (MGNNs) to improve prediction performance. However, existing MGNN methods cannot be directly applied to LSTF due to several issues: the low level of generality, insufficient use of contextual information, and the imbalanced graph fusion approach. To address these issues, we construct new graph models to represent the contextual information of each node and the long-term spatio-temporal data dependency structure. To fuse the information across multiple graphs, we propose a new dynamic multi-graph fusion module to characterize the correlations of nodes within a graph and the nodes across graphs via the spatial attention and graph attention mechanisms. Furthermore, we introduce a trainable weight tensor to indicate the importance of each node in different graphs. Extensive experiments on two large-scale datasets demonstrate that our proposed approaches significantly improve the performance of existing graph neural network models in LSTF prediction tasks.  ( 2 min )
    Sublinear Time Approximation of Text Similarity Matrices. (arXiv:2112.09631v3 [cs.LG] UPDATED)
    We study algorithms for approximating pairwise similarity matrices that arise in natural language processing. Generally, computing a similarity matrix for $n$ data points requires $\Omega(n^2)$ similarity computations. This quadratic scaling is a significant bottleneck, especially when similarities are computed via expensive functions, e.g., via transformer models. Approximation methods reduce this quadratic complexity, often by using a small subset of exactly computed similarities to approximate the remainder of the complete pairwise similarity matrix. Significant work focuses on the efficient approximation of positive semidefinite (PSD) similarity matrices, which arise e.g., in kernel methods. However, much less is understood about indefinite (non-PSD) similarity matrices, which often arise in NLP. Motivated by the observation that many of these matrices are still somewhat close to PSD, we introduce a generalization of the popular Nystr\"{o}m method to the indefinite setting. Our algorithm can be applied to any similarity matrix and runs in sublinear time in the size of the matrix, producing a rank-$s$ approximation with just $O(ns)$ similarity computations. We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in the downstream tasks of document classification, sentence similarity, and cross-document coreference.  ( 2 min )
    AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees. (arXiv:2201.07984v2 [cs.AI] UPDATED)
    Using the pre-trained language model (i.e. BERT) to apprehend source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to directly solve programming language (PL) related problems. To this end, we propose the AstBERT model, a pre-trained language model aiming to better understand the financial PL using the abstract syntax tree (AST). Specifically, we collect a colossal amount of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three downstream tasks.  ( 2 min )
    ROMNet: Renovate the Old Memories. (arXiv:2202.02606v2 [eess.IV] UPDATED)
    Renovating the memories in old photos is an intriguing research topic in computer vision fields. These legacy images often suffer from severe and commingled degradations such as cracks, noise, and color-fading, while lack of large-scale paired old photo datasets makes this restoration task very challenging. In this work, we present a novel reference-based end-to-end learning framework that can jointly repair and colorize the degraded legacy pictures. Specifically, the proposed framework consists of three modules: a restoration sub-network for degradation restoration, a similarity sub-network for color histogram matching and transfer, and a colorization subnet that learns to predict the chroma elements of the images conditioned on chromatic reference signals. The whole system takes advantage of the color histogram priors in a given reference image, which vastly reduces the dependency on large-scale training data. Apart from the proposed method, we also create, to our knowledge, the first public and real-world old photo dataset with paired ground truth for evaluating old photo restoration models, wherein each old photo is paired with a manually restored pristine image by PhotoShop experts. Our extensive experiments conducted on both synthetic and real-world datasets demonstrate that our method significantly outperforms state-of-the-arts both quantitatively and qualitatively.  ( 2 min )
    LaPraDoR: Unsupervised Pretrained Dense Retriever for Zero-Shot Text Retrieval. (arXiv:2203.06169v2 [cs.CL] UPDATED)
    In this paper, we propose LaPraDoR, a pretrained dual-tower dense retriever that does not require any supervised data for training. Specifically, we first present Iterative Contrastive Learning (ICoL) that iteratively trains the query and document encoders with a cache mechanism. ICoL not only enlarges the number of negative instances but also keeps representations of cached examples in the same hidden space. We then propose Lexicon-Enhanced Dense Retrieval (LEDR) as a simple yet effective way to enhance dense retrieval with lexical matching. We evaluate LaPraDoR on the recently proposed BEIR benchmark, including 18 datasets of 9 zero-shot text retrieval tasks. Experimental results show that LaPraDoR achieves state-of-the-art performance compared with supervised dense retrieval models, and further analysis reveals the effectiveness of our training strategy and objectives. Compared to re-ranking, our lexicon-enhanced approach can be run in milliseconds (22.5x faster) while achieving superior performance.  ( 2 min )
    Accelerated Proximal Alternating Gradient-Descent-Ascent for Nonconvex Minimax Machine Learning. (arXiv:2112.11663v6 [cs.LG] UPDATED)
    Alternating gradient-descent-ascent (AltGDA) is an optimization algorithm that has been widely used for model training in various machine learning applications, which aims to solve a nonconvex minimax optimization problem. However, the existing studies show that it suffers from a high computation complexity in nonconvex minimax optimization. In this paper, we develop a single-loop and fast AltGDA-type algorithm that leverages proximal gradient updates and momentum acceleration to solve regularized nonconvex minimax optimization problems. By leveraging the momentum acceleration technique, we prove that the algorithm converges to a critical point in nonconvex minimax optimization and achieves a computation complexity in the order of $\mathcal{O}(\kappa^{\frac{11}{6}}\epsilon^{-2})$, where $\epsilon$ is the desired level of accuracy and $\kappa$ is the problem's condition number. {Such a computation complexity improves the state-of-the-art complexities of single-loop GDA and AltGDA algorithms (see the summary of comparison in \Cref{table1})}. We demonstrate the effectiveness of our algorithm via an experiment on adversarial deep learning.  ( 2 min )
    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite. (arXiv:2202.12169v2 [eess.AS] UPDATED)
    VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlapping speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as most smart home devices now support multiple enrolled users. In order to extend the benefits of personalization to multiple users, we previously developed an attention-based speaker selection mechanism and applied it to VoiceFilter-Lite. However, the original multi-user VoiceFilter-Lite model suffers from significant performance degradation compared with single-user models. In this paper, we devised a series of experiments to improve the multi-user VoiceFilter-Lite model. By incorporating a dual learning rate schedule and by using feature-wise linear modulation (FiLM) to condition the model with the attended speaker embedding, we successfully closed the performance gap between multi-user and single-user VoiceFilter-Lite models on single-speaker evaluations. At the same time, the new model can also be easily extended to support any number of users, and significantly outperforms our previously published model on multi-speaker evaluations.  ( 2 min )
    GNMR: A provable one-line algorithm for low rank matrix recovery. (arXiv:2106.12933v3 [math.OC] UPDATED)
    Low rank matrix recovery problems, including matrix completion and matrix sensing, appear in a broad range of applications. In this work we present GNMR -- an extremely simple iterative algorithm for low rank matrix recovery, based on a Gauss-Newton linearization. On the theoretical front, we derive recovery guarantees for GNMR in both the matrix sensing and matrix completion settings. Some of these results improve upon the best currently known for other methods. A key property of GNMR is that it implicitly keeps the factor matrices approximately balanced throughout its iterations. On the empirical front, we show that for matrix completion with uniform sampling, GNMR performs better than several popular methods, especially when given very few observations close to the information limit.  ( 2 min )
    Differentially Private SGDA for Minimax Problems. (arXiv:2201.09046v3 [cs.LG] UPDATED)
    Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.  ( 2 min )
    ISNet: Costless and Implicit Image Segmentation for Deep Classifiers, with Application in COVID-19 Detection. (arXiv:2202.00232v3 [eess.IV] UPDATED)
    This work proposes a novel deep neural network (DNN) architecture, Implicit Segmentation Neural Network (ISNet), to solve the task of image segmentation followed by classification. It substitutes the common pipeline of two DNNs with a single model. We designed the ISNet for high flexibility and performance: it allows virtually any classification neural network architecture to analyze a common image as if it had been previously segmented. Furthermore, in relation to the unmodified classifier, the ISNet does not cause any increment in computational cost at run-time. We test the architecture with two applications: COVID-19 detection in chest X-rays, and facial attribute estimation. We implement an ISNet based on a DenseNet121 classifier, and compare the model to a U-net (performing lung/face segmentation) followed by a DenseNet121, and to a standalone DenseNet121. The new architecture matched the other DNNs in facial attribute estimation. Moreover, it strongly surpassed them in COVID-19 detection, according to an external test dataset. The ISNet precisely ignored the image regions outside of the lungs or faces. Therefore, in COVID-19 detection it reduced the effects of background bias and shortcut learning, and it improved security in facial attribute estimation. ISNet presents an accurate, fast, and light methodology. The successful implicit segmentation, considering two largely diverse fields, highlights the architecture's general applicability.  ( 3 min )
    Variational Learning for Unsupervised Knowledge Grounded Dialogs. (arXiv:2112.00653v3 [cs.CL] UPDATED)
    Recent methods for knowledge grounded dialogs generate responses by incorporating information from an external textual document. These methods do not require the exact document to be known during training and rely on the use of a retrieval system to fetch relevant documents from a large index. The documents used to generate the responses are modeled as latent variables whose prior probabilities need to be estimated. Models such as RAG and REALM, marginalize the document probabilities over the documents retrieved from the index to define the log likelihood loss function which is optimized end-to-end. In this paper, we develop a variational approach to the above technique wherein, we instead maximize the Evidence Lower bound (ELBO). Using a collection of three publicly available open-conversation datasets, we demonstrate how the posterior distribution, that has information from the ground-truth response, allows for a better approximation of the objective function during training. To overcome the challenges associated with sampling over a large knowledge collection, we develop an efficient approach to approximate the ELBO. To the best of our knowledge we are the first to apply variational training for open-scale unsupervised knowledge grounded dialog systems.  ( 2 min )
    Single-pass Object-adaptive Data Undersampling and Reconstruction for MRI. (arXiv:2111.09212v2 [eess.IV] UPDATED)
    There is much recent interest in techniques to accelerate the data acquisition process in MRI by acquiring limited measurements. Often sophisticated reconstruction algorithms are deployed to maintain high image quality in such settings. In this work, we propose a data-driven sampler using a convolutional neural network, MNet, to provide object-specific sampling patterns adaptive to each scanned object. The network observes very limited low-frequency k-space data for each object and rapidly predicts the desired undersampling pattern in one go that achieves high image reconstruction quality. We propose an accompanying alternating-type training framework with a mask-backward procedure that efficiently generates training labels for the sampler network and jointly trains an image reconstruction network. Experimental results on the fastMRI knee dataset demonstrate the ability of the proposed learned undersampling network to generate object-specific masks at fourfold and eightfold acceleration that achieve superior image reconstruction performance than several existing schemes. The source code for the proposed joint sampling and reconstruction learning framework is available at https://github.com/zhishenhuang/mri.  ( 2 min )
    Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters. (arXiv:2111.01409v2 [stat.ML] UPDATED)
    It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.  ( 2 min )
    Building separable approximations for quantum states via neural networks. (arXiv:2112.08055v4 [quant-ph] UPDATED)
    Finding the closest separable state to a given target state is a notoriously difficult task, even more difficult than deciding whether a state is entangled or separable. To tackle this task, we parametrize separable states with a neural network and train it to minimize the distance to a given target state, with respect to a differentiable distance, such as the trace distance or Hilbert--Schmidt distance. By examining the output of the algorithm, we obtain an upper bound on the entanglement of the target state, and construct an approximation for its closest separable state. We benchmark the method on a variety of well-known classes of bipartite states and find excellent agreement, even up to local dimension of $d=10$, while providing conjectures and analytic insight for isotropic and Werner states. Moreover, we show our method to be efficient in the multipartite case, considering different notions of separability. Examining three and four-party GHZ and W states we recover known bounds and obtain novel ones, for instance for triseparability.  ( 2 min )
    Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis. (arXiv:2111.15186v2 [cs.LG] UPDATED)
    Obtaining annotations for large training sets is expensive, especially in settings where domain knowledge is required, such as behavior analysis. Weak supervision has been studied to reduce annotation costs by using weak labels from task-specific labeling functions (LFs) to augment ground truth labels. However, domain experts still need to hand-craft different LFs for different tasks, limiting scalability. To reduce expert effort, we present AutoSWAP: a framework for automatically synthesizing data-efficient task-level LFs. The key to our approach is to efficiently represent expert knowledge in a reusable domain-specific language and more general domain-level LFs, with which we use state-of-the-art program synthesis techniques and a small labeled dataset to generate task-level LFs. Additionally, we propose a novel structural diversity cost that allows for efficient synthesis of diverse sets of LFs, further improving AutoSWAP's performance. We evaluate AutoSWAP in three behavior analysis domains and demonstrate that AutoSWAP outperforms existing approaches using only a fraction of the data. Our results suggest that AutoSWAP is an effective way to automatically generate LFs that can significantly reduce expert effort for behavior analysis.  ( 2 min )
    Regularized Newton Method with Global $O(1/k^2)$ Convergence. (arXiv:2112.02089v2 [math.OC] UPDATED)
    We present a Newton-type method that converges fast from any initialization and for arbitrary convex objectives with Lipschitz Hessians. We achieve this by merging the ideas of cubic regularization with a certain adaptive Levenberg--Marquardt penalty. In particular, we show that the iterates given by $x^{k+1}=x^k - \bigl(\nabla^2 f(x^k) + \sqrt{H\|\nabla f(x^k)\|} \mathbf{I}\bigr)^{-1}\nabla f(x^k)$, where $H>0$ is a constant, converge globally with a $\mathcal{O}(\frac{1}{k^2})$ rate. Our method is the first variant of Newton's method that has both cheap iterations and provably fast global convergence. Moreover, we prove that locally our method converges superlinearly when the objective is strongly convex. To boost the method's performance, we present a line search procedure that does not need hyperparameters and is provably efficient.  ( 2 min )
    Physics-Driven Learning of Wasserstein GAN for Density Reconstruction in Dynamic Tomography. (arXiv:2110.15424v2 [eess.IV] UPDATED)
    Object density reconstruction from projections containing scattered radiation and noise is of critical importance in many applications. Existing scatter correction and density reconstruction methods may not provide the high accuracy needed in many applications and can break down in the presence of unmodeled or anomalous scatter and other experimental artifacts. Incorporating machine-learned models could prove beneficial for accurate density reconstruction particularly in dynamic imaging, where the time-evolution of the density fields could be captured by partial differential equations or by learning from hydrodynamics simulations. In this work, we demonstrate the ability of learned deep neural networks to perform artifact removal in noisy density reconstructions, where the noise is imperfectly characterized. We use a Wasserstein generative adversarial network (WGAN), where the generator serves as a denoiser that removes artifacts in densities obtained from traditional reconstruction algorithms. We train the networks from large density time-series datasets, with noise simulated according to parametric random distributions that may mimic noise in experiments. The WGAN is trained with noisy density frames as generator inputs, to match the generator outputs to the distribution of clean densities (time-series) from simulations. A supervised loss is also included in the training, which leads to improved density restoration performance. In addition, we employ physics-based constraints such as mass conservation during network training and application to further enable highly accurate density reconstructions. Our preliminary numerical results show that the models trained in our frameworks can remove significant portions of unknown noise in density time-series data.  ( 2 min )
    A Study of Fake News Reading and Annotating in Social Media Context. (arXiv:2109.12523v2 [cs.HC] UPDATED)
    The online spreading of fake news is a major issue threatening entire societies. Much of this spreading is enabled by new media formats, namely social networks and online media sites. Researchers and practitioners have been trying to answer this by characterizing the fake news and devising automated methods for detecting them. The detection methods had so far only limited success, mostly due to the complexity of the news content and context and lack of properly annotated datasets. One possible way to boost the efficiency of automated misinformation detection methods, is to imitate the detection work of humans. It is also important to understand the news consumption behavior of online users. In this paper, we present an eye-tracking study, in which we let 44 lay participants to casually read through a social media feed containing posts with news articles, some of which were fake. In a second run, we asked the participants to decide on the truthfulness of these articles. We also describe a follow-up qualitative study with a similar scenario but this time with 7 expert fake news annotators. We present the description of both studies, characteristics of the resulting dataset (which we hereby publish) and several findings.  ( 2 min )
    Encoding Involutory Invariances in Neural Networks. (arXiv:2106.12891v2 [cs.LG] UPDATED)
    In certain situations, neural networks are trained upon data that obey underlying symmetries. However, the predictions do not respect the symmetries exactly unless embedded in the network structure. In this work, we introduce architectures that embed a special kind of symmetry namely, invariance with respect to involutory linear/affine transformations up to parity $p=\pm 1$. We provide rigorous theorems to show that the proposed network ensures such an invariance and present qualitative arguments for a special universal approximation theorem. An adaption of our techniques to CNN tasks for datasets with inherent horizontal/vertical reflection symmetry is demonstrated. Extensive experiments indicate that the proposed model outperforms baseline feed-forward and physics-informed neural networks while identically respecting the underlying symmetry.  ( 2 min )
    Optimal Epidemic Control as a Contextual Combinatorial Bandit with Budget. (arXiv:2106.15808v2 [cs.LG] UPDATED)
    In light of the COVID-19 pandemic, it is an open challenge and critical practical problem to find a optimal way to dynamically prescribe the best policies that balance both the governmental resources and epidemic control in different countries and regions. To solve this multi-dimensional tradeoff of exploitation and exploration, we formulate this technical challenge as a contextual combinatorial bandit problem that jointly optimizes a multi-criteria reward function. Given the historical daily cases in a region and the past intervention plans in place, the agent should generate useful intervention plans that policy makers can implement in real time to minimizing both the number of daily COVID-19 cases and the stringency of the recommended interventions. We prove this concept with simulations of multiple realistic policy making scenarios and demonstrate a clear advantage in providing a pareto optimal solution in the epidemic intervention problem.  ( 2 min )
    Leveraging power grid topology in machine learning assisted optimal power flow. (arXiv:2110.00306v3 [cs.LG] UPDATED)
    Machine learning assisted optimal power flow (OPF) aims to reduce the computational complexity of these non-linear and non-convex constrained optimization problems by consigning expensive (online) optimization to offline training. The majority of work in this area typically employs fully connected neural networks (FCNN). However, recently convolutional (CNN) and graph (GNN) neural networks have also been investigated, in effort to exploit topological information within the power grid. Although promising results have been obtained, there lacks a systematic comparison between these architectures throughout literature. Accordingly, we introduce a concise framework for generalizing methods for machine learning assisted OPF and assess the performance of a variety of FCNN, CNN and GNN models for two fundamental approaches in this domain: regression (predicting optimal generator set-points) and classification (predicting the active set of constraints). For several synthetic power grids with interconnected utilities, we show that locality properties between feature and target variables are scarce and subsequently demonstrate marginal utility of applying CNN and GNN architectures compared to FCNN for a fixed grid topology. However, with variable topology (for instance, modeling transmission line contingency), GNN models are able to straightforwardly take the change of topological information into account and outperform both FCNN and CNN models.  ( 2 min )
    Maximum Entropy Dueling Network Architecture in Atari Domain. (arXiv:2107.14457v2 [cs.LG] UPDATED)
    In recent years, there have been many deep structures for Reinforcement Learning, mainly for value function estimation and representations. These methods achieved great success in Atari 2600 domain. In this paper, we propose an improved architecture based upon Dueling Networks, in this architecture, there are two separate estimators, one approximate the state value function and the other, state advantage function. This improvement based on Maximum Entropy, shows better policy evaluation compared to the original network and other value-based architectures in Atari domain.  ( 2 min )
    Back2Future: Leveraging Backfill Dynamics for Improving Real-time Predictions in Future. (arXiv:2106.04420v8 [cs.LG] UPDATED)
    In real-time forecasting in public health, data collection is a non-trivial and demanding task. Often after initially released, it undergoes several revisions later (maybe due to human or technical constraints) - as a result, it may take weeks until the data reaches to a stable value. This so-called 'backfill' phenomenon and its effect on model performance has been barely studied in the prior literature. In this paper, we introduce the multi-variate backfill problem using COVID-19 as the motivating example. We construct a detailed dataset composed of relevant signals over the past year of the pandemic. We then systematically characterize several patterns in backfill dynamics and leverage our observations for formulating a novel problem and neural framework Back2Future that aims to refines a given model's predictions in real-time. Our extensive experiments demonstrate that our method refines the performance of top models for COVID-19 forecasting, in contrast to non-trivial baselines, yielding 18% improvement over baselines, enabling us obtain a new SOTA performance. In addition, we show that our model improves model evaluation too; hence policy-makers can better understand the true accuracy of forecasting models in real-time.  ( 3 min )
    SALIENCE: An Unsupervised User Adaptation Model for Multiple Wearable Sensors Based Human Activity Recognition. (arXiv:2108.10213v2 [eess.SP] UPDATED)
    Unsupervised user adaptation aligns the feature distributions of the data from training users and the new user, so a well-trained wearable human activity recognition (WHAR) model can be well adapted to the new user. With the development of wearable sensors, multiple wearable sensors based WHAR is gaining more and more attention. In order to address the challenge that the transferabilities of different sensors are different, we propose SALIENCE (unsupervised user adaptation model for multiple wearable sensors based human activity recognition) model. It aligns the data of each sensor separately to achieve local alignment, while uniformly aligning the data of all sensors to ensure global alignment. In addition, an attention mechanism is proposed to focus the activity classifier of SALIENCE on the sensors with strong feature discrimination and well distribution alignment. Experiments are conducted on two public WHAR datasets, and the experimental results show that our model can yield a competitive performance.  ( 2 min )
    SeqDialN: Sequential Visual Dialog Networks in Joint Visual-Linguistic Representation Space. (arXiv:2008.00397v2 [cs.CV] UPDATED)
    In this work, we formulate a visual dialog as an information flow in which each piece of information is encoded with the joint visual-linguistic representation of a single dialog round. Based on this formulation, we consider the visual dialog task as a sequence problem consisting of ordered visual-linguistic vectors. For featurization, we use a Dense Symmetric Co-Attention network as a lightweight vison-language joint representation generator to fuse multimodal features (i.e., image and text), yielding better computation and data efficiencies. For inference, we propose two Sequential Dialog Networks (SeqDialN): the first uses LSTM for information propagation (IP) and the second uses a modified Transformer for multi-step reasoning (MR). Our architecture separates the complexity of multimodal feature fusion from that of inference, which allows simpler design of the inference engine. IP based SeqDialN is our baseline with a simple 2-layer LSTM design that achieves decent performance. MR based SeqDialN, on the other hand, recurrently refines the semantic question/history representations through the self-attention stack of Transformer and produces promising results on the visual dialog task. On VisDial v1.0 test-std dataset, our best single generative SeqDialN achieves 62.54% NDCG and 48.63% MRR; our ensemble generative SeqDialN achieves 63.78% NDCG and 49.98% MRR, which set a new state-of-the-art generative visual dialog model. We fine-tune discriminative SeqDialN with dense annotations and boost the performance up to 72.41% NDCG and 55.11% MRR. In this work, we discuss the extensive experiments we have conducted to demonstrate the effectiveness of our model components. We also provide visualization for the reasoning process from the relevant conversation rounds and discuss our fine-tuning methods. Our code is available at https://github.com/xiaoxiaoheimei/SeqDialN  ( 3 min )
    IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures. (arXiv:2103.02588v4 [cs.CE] UPDATED)
    Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignores the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from properties to geometries. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy ($R^2$-scores between target properties and properties of generated unit cells $>98\%$) and 2) improve the optimized structural performance over the conventional variable-density single-type structure. In the minimum compliance example, our IH-GAN generated structure achieves a $79.7\%$ reduction in concentrated stress and an extra $3.03\%$ reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by $86.4\%$ and $79.6\%$ for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.  ( 3 min )
    Unify Local and Global Information for Top-$N$ Recommendation. (arXiv:2012.01635v2 [cs.IR] UPDATED)
    Knowledge graph (KG), integrating complex information and containing rich semantics, is widely considered as side information to enhance the recommendation systems. However, most of the existing KG-based methods concentrate on encoding the structural information in the graph, without utilizing the collaborative signals in user-item interaction data, which are important for understanding user preferences. Therefore, the representations learned by these models are insufficient for representing semantic information of users and items in the recommendation environment. The combination of both kinds of data provides a good chance to solve this problem. To tackle this research gap, we propose a novel duet representation learning framework named \sysname to fuse local information (user-item interaction data) and global information (external knowledge graph) for the top-$N$ recommendation, which is composed of two separate sub-models. One learns the local representations by discovering the inner correlations in local information with a knowledge-aware co-attention mechanism, and another learns the global representations by encoding the knowledge associations in global information with a relation-aware attention network. The two sub-models are jointly trained as part of the semantic fusion network to compute the user preferences, which discriminates the contribution of the two sub-models under the special context. We conduct experiments on two real-world datasets, and the evaluations show that KADM significantly outperforms state-of-art methods. Further ablation studies confirm that the duet architecture performs significantly better than either sub-model on the recommendation tasks.  ( 2 min )
    Variational Kalman Filtering with Hinf-Based Correction for Robust Bayesian Learning in High Dimensions. (arXiv:2204.13089v1 [stat.ML])
    In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and Hinf-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a framework that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a Hinf-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF- Hinf recursion that employs consecutive variational inference and Hinf based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.  ( 2 min )
    Exoskeleton-Based Multimodal Action and Movement Recognition: Identifying and Developing the Optimal Boosted Learning Approach. (arXiv:2106.10331v2 [cs.RO] UPDATED)
    This paper makes two scientific contributions to the field of exoskeleton-based action and movement recognition. First, it presents a novel machine learning and pattern recognition-based framework that can detect a wide range of actions and movements - walking, walking upstairs, walking downstairs, sitting, standing, lying, stand to sit, sit to stand, sit to lie, lie to sit, stand to lie, and lie to stand, with an overall accuracy of 82.63%. Second, it presents a comprehensive comparative study of different learning approaches - Random Forest, Artificial Neural Network, Decision Tree, Multiway Decision Tree, Support Vector Machine, k-NN, Gradient Boosted Trees, Decision Stump, AutoMLP, Linear Regression, Vector Linear Regression, Random Tree, Na\"ive Bayes, Na\"ive Bayes (Kernel), Linear Discriminant Analysis, Quadratic Discriminant Analysis, and Deep Learning applied to this framework. The performance of each of these learning approaches was boosted by using the AdaBoost algorithm, and the Cross Validation approach was used for training and testing. The results show that in boosted form, the k-NN classifier outperforms all the other boosted learning approaches and is, therefore, the optimal learning method for this purpose. The results presented and discussed uphold the importance of this work to contribute towards augmenting the abilities of exoskeleton-based assisted and independent living of the elderly in the future of Internet of Things-based living environments, such as Smart Homes. As a specific use case, we also discuss how the findings of our work are relevant for augmenting the capabilities of the Hybrid Assistive Limb exoskeleton, a highly functional lower limb exoskeleton.  ( 2 min )
    Residual Contrastive Learning for Image Reconstruction: Learning Transferable Representations from Noisy Images. (arXiv:2106.10070v2 [cs.CV] UPDATED)
    This paper is concerned with contrastive learning (CL) for low-level image restoration and enhancement tasks. We propose a new label-efficient learning paradigm based on residuals, residual contrastive learning (RCL), and derive an unsupervised visual representation learning framework, suitable for low-level vision tasks with noisy inputs. While supervised image reconstruction aims to minimize residual terms directly, RCL alternatively builds a connection between residuals and CL by defining a novel instance discrimination pretext task, using residuals as the discriminative feature. Our formulation mitigates the severe task misalignment between instance discrimination pretext tasks and downstream image reconstruction tasks, present in existing CL frameworks. Experimentally, we find that RCL can learn robust and transferable representations that improve the performance of various downstream tasks, such as denoising and super resolution, in comparison with recent self-supervised methods designed specifically for noisy inputs. Additionally, our unsupervised pre-training can significantly reduce annotation costs whilst maintaining performance competitive with fully-supervised image reconstruction.  ( 2 min )
    Neural String Edit Distance. (arXiv:2104.08388v2 [cs.CL] UPDATED)
    We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance. We modify the original expectation-maximization learned edit distance algorithm into a differentiable loss function, allowing us to integrate it into a neural network providing a contextual representation of the input. We evaluate on cognate detection, transliteration, and grapheme-to-phoneme conversion, and show that we can trade off between performance and interpretability in a single framework. Using contextual representations, which are difficult to interpret, we match the performance of state-of-the-art string-pair matching models. Using static embeddings and a slightly different loss function, we force interpretability, at the expense of an accuracy drop.  ( 2 min )
    Federated Reconstruction: Partially Local Federated Learning. (arXiv:2102.03448v6 [cs.LG] UPDATED)
    Personalization methods in federated learning aim to balance the benefits of federated and local training for data availability, communication cost, and robustness to client heterogeneity. Approaches that require clients to communicate all model parameters can be undesirable due to privacy and communication constraints. Other approaches require always-available or stateful clients, impractical in large-scale cross-device settings. We introduce Federated Reconstruction, the first model-agnostic framework for partially local federated learning suitable for training and inference at scale. We motivate the framework via a connection to model-agnostic meta learning, empirically demonstrate its performance over existing approaches for collaborative filtering and next word prediction, and release an open-source library for evaluating approaches in this setting. We also describe the successful deployment of this approach at scale for federated collaborative filtering in a mobile keyboard application.  ( 2 min )
    Rethinking the Promotion Brought by Contrastive Learning to Semi-Supervised Node Classification. (arXiv:2012.07437v2 [cs.LG] UPDATED)
    Graph Contrastive Learning (GCL) has proven highly effective in promoting the performance of Semi-Supervised Node Classification (SSNC). However, existing GCL methods are generally transferred from other fields like CV or NLP, whose underlying working mechanism remains under-explored. In this work, we first deeply probe the working mechanism of GCL in SSNC, and find that the promotion brought by GCL is severely unevenly distributed: the improvement mainly comes from subgraphs with less annotated information, which is fundamentally different from contrastive learning in other fields. However, existing GCL methods generally ignore this uneven distribution of annotated information and apply GCL evenly to the whole graph. To remedy this issue and further improve GCL in SSNC, we propose the Topology InFormation gain-Aware Graph Contrastive Learning (TIFA-GCL) framework that considers the annotated information distribution across graph in GCL. Extensive experiments on six benchmark graph datasets, including the enormous OGB-Products graph, show that TIFA-GCL can bring a larger improvement than existing GCL methods in both transductive and inductive settings. Further experiments demonstrate the generalizability and interpretability of TIFA-GCL.  ( 2 min )
    Towards assessing agricultural land suitability with causal machine learning. (arXiv:2204.12956v1 [cs.LG])
    Understanding the suitability of agricultural land for applying specific management practices is of great importance for sustainable and resilient agriculture against climate change. Recent developments in the field of causal machine learning enable the estimation of intervention impacts on an outcome of interest, for samples described by a set of observed characteristics. We introduce an extensible data-driven framework that leverages earth observations and frames agricultural land suitability as a geospatial impact assessment problem, where the estimated effects of agricultural practices on agroecosystems serve as a land suitability score and guide decision making. We formulate this as a causal machine learning task and discuss how this approach can be used for agricultural planning in a changing climate. Specifically, we extract the agricultural management practices of "crop rotation" and "landscape crop diversity" from crop type maps, account for climate and land use data, and use double machine learning to estimate their heterogeneous effect on Net Primary Productivity (NPP), within the Flanders region of Belgium from 2010 to 2020. We find that the effect of crop rotation was insignificant, while landscape crop diversity had a small negative effect on NPP. Finally, we observe considerable effect heterogeneity in space for both practices and analyze it.  ( 2 min )
    Differentially Quantized Gradient Methods. (arXiv:2002.02508v4 [cs.LG] UPDATED)
    Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.  ( 3 min )
    Network Classification Based Structural Analysis of Real Networks and their Model-Generated Counterparts. (arXiv:1810.08498v4 [cs.SI] UPDATED)
    Data-driven analysis of complex networks has been in the focus of research for decades. An important area of research is to study how well real networks can be described with a small selection of metrics, furthermore how well network models can capture the relations between graph metrics observed in real networks. In this paper, we apply machine learning techniques to investigate the aforementioned problems. We study 500 real-world networks along with 2,000 synthetic networks generated by four frequently used network models with previously calibrated parameters to make the generated graphs as similar to the real networks as possible. This paper unifies several branches of data-driven complex network analysis, such as the study of graph metrics and their pair-wise relationships, network similarity estimation, model calibration, and graph classification. We find that the correlation profiles of the structural measures significantly differ across network domains and the domain can be efficiently determined using a small selection of graph metrics. The structural properties of the network models with fixed parameters are robust enough to perform parameter calibration. The goodness-of-fit of the network models highly depends on the network domain. By solving classification problems, we find that the models lack the capability of generating a graph with a high clustering coefficient and relatively large diameter simultaneously. On the other hand, models are able to capture exactly the degree-distribution-related metrics.  ( 2 min )
    Sequence-Based Target Coin Prediction for Cryptocurrency Pump-and-Dump. (arXiv:2204.12929v1 [q-fin.ST])
    As the pump-and-dump schemes (P&Ds) proliferate in the cryptocurrency market, it becomes imperative to detect such fraudulent activities in advance, to inform potentially susceptible investors before they become victims. In this paper, we focus on the target coin prediction task, i.e., to predict the pump probability of all coins listed in the target exchange before a pump. We conduct a comprehensive study of the latest P&Ds, investigate 709 events organized in Telegram channels from Jan. 2019 to Jan. 2022, and unearth some abnormal yet interesting patterns of P&Ds. Empirical analysis demonstrates that pumped coins exhibit intra-channel homogeneity and inter-channel heterogeneity, which inspires us to develop a novel sequence-based neural network named SNN. Specifically, SNN encodes each channel's pump history as a sequence representation via a positional attention mechanism, which filters useful information and alleviates the noise introduced when the sequence length is long. We also identify and address the coin-side cold-start problem in a practical setting. Extensive experiments show a lift of 1.6% AUC and 41.0% Hit Ratio@3 brought by our method, making it well-suited for real-world application. As a side contribution, we release the source code of our entire data science pipeline on GitHub, along with the dataset tailored for studying the latest P&Ds.  ( 2 min )
    Accurate inference of crowdsourcing properties when using efficient allocation strategies. (arXiv:1903.03104v2 [cs.LG] UPDATED)
    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.  ( 2 min )
    An Iterative Labeling Method for Annotating Fisheries Imagery. (arXiv:2204.12934v1 [cs.LG])
    In this paper, we present a methodology for fisheries-related data that allows us to converge on a labeled image dataset by iterating over the dataset with multiple training and production loops that can exploit crowdsourcing interfaces. We present our algorithm and its results on two separate sets of image data collected using the Seabed autonomous underwater vehicle. The first dataset comprises of 2,026 completely unlabeled images, while the second consists of 21,968 images that were point annotated by experts. Our results indicate that training with a small subset and iterating on that to build a larger set of labeled data allows us to converge to a fully annotated dataset with a small number of iterations. Even in the case of a dataset labeled by experts, a single iteration of the methodology improves the labels by discovering additional complicated examples of labels associated with fish that overlap, are very small, or obscured by the contrast limitations associated with underwater imagery.  ( 2 min )
    Treating Crowdsourcing as Examination: How to Score Tasks and Online Workers?. (arXiv:2204.13065v1 [cs.HC])
    Crowdsourcing is an online outsourcing mode which can solve the current machine learning algorithm's urge need for massive labeled data. Requester posts tasks on crowdsourcing platforms, which employ online workers over the Internet to complete tasks, then aggregate and return results to requester. How to model the interaction between different types of workers and tasks is a hot spot. In this paper, we try to model workers as four types based on their ability: expert, normal worker, sloppy worker and spammer, and divide tasks into hard, medium and easy task according to their difficulty. We believe that even experts struggle with difficult tasks while sloppy workers can get easy tasks right, and spammers always give out wrong answers deliberately. So, good examination tasks should have moderate degree of difficulty and discriminability to score workers more objectively. Thus, we first score workers' ability mainly on the medium difficult tasks, then reducing the weight of answers from sloppy workers and modifying the answers from spammers when inferring the tasks' ground truth. A probability graph model is adopted to simulate the task execution process, and an iterative method is adopted to calculate and update the ground truth, the ability of workers and the difficulty of the task successively. We verify the rightness and effectiveness of our algorithm both in simulated and real crowdsourcing scenes.  ( 2 min )
    NFT Appraisal Prediction: Utilizing Search Trends, Public Market Data, Linear Regression and Recurrent Neural Networks. (arXiv:2204.12932v1 [q-fin.ST])
    In this paper we investigate the correlation between NFT valuations and various features from three primary categories: public market data, NFT metadata, and social trends data.  ( 2 min )
    Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning. (arXiv:2204.13060v1 [cs.LG])
    Building generalizable goal-conditioned agents from rich observations is a key to reinforcement learning (RL) solving real world problems. Traditionally in goal-conditioned RL, an agent is provided with the exact goal they intend to reach. However, it is often not realistic to know the configuration of the goal before performing a task. A more scalable framework would allow us to provide the agent with an example of an analogous task, and have the agent then infer what the goal should be for its current state. We propose a new form of state abstraction called goal-conditioned bisimulation that captures functional equivariance, allowing for the reuse of skills to achieve new goals. We learn this representation using a metric form of this abstraction, and show its ability to generalize to new goals in simulation manipulation tasks. Further, we prove that this learned representation is sufficient not only for goal conditioned tasks, but is amenable to any downstream task described by a state-only reward function. Videos can be found at https://sites.google.com/view/gc-bisimulation.  ( 2 min )
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v1 [cs.LG])
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 2 min )
    Can deep learning match the efficiency of human visual long-term memory to store object details?. (arXiv:2204.13061v1 [cs.LG])
    Humans have a remarkably large capacity to store detailed visual information in long-term memory even after a single exposure, as demonstrated by classic experiments in psychology. For example, Standing (1973) showed that humans could recognize with high accuracy thousands of pictures that they had seen only once a few days prior to a recognition test. In deep learning, the primary mode of incorporating new information into a model is through gradient descent in the model's parameter space. This paper asks whether deep learning via gradient descent can match the efficiency of human visual long-term memory to incorporate new information in a rigorous, head-to-head, quantitative comparison. We answer this in the negative: even in the best case, models learning via gradient descent appear to require approximately 10 exposures to the same visual materials in order to reach a recognition memory performance humans achieve after only a single exposure. Prior knowledge induced via pretraining and bigger model sizes improve performance, but these improvements are not very visible after a single exposure (it takes a few exposures for the improvements to become apparent), suggesting that simply scaling up the pretraining data size or model size might not be enough for the model to reach human-level memory efficiency.  ( 2 min )
    NLU++: A Multi-Label, Slot-Rich, Generalisable Dataset for Natural Language Understanding in Task-Oriented Dialogue. (arXiv:2204.13021v1 [cs.CL])
    We present NLU++, a novel dataset for natural language understanding (NLU) in task-oriented dialogue (ToD) systems, with the aim to provide a much more challenging evaluation environment for dialogue NLU models, up to date with the current application and industry requirements. NLU++ is divided into two domains (BANKING and HOTELS) and brings several crucial improvements over current commonly used NLU datasets. \textbf{1)} NLU++ provides fine-grained domain ontologies with a large set of challenging \textit{multi-intent} sentences, introducing and validating the idea of \textit{intent modules} that can be combined into complex intents that convey complex user goals, combined with finer-grained and thus more challenging slot sets. \textbf{2)} The ontology is divided into \textit{domain-specific} and \textit{generic} (i.e., domain-universal) intent modules that overlap across domains, promoting cross-domain reusability of annotated examples. \textbf{3)} The dataset design has been inspired by the problems observed in industrial ToD systems, and \textbf{4)} it has been collected, filtered and carefully annotated by dialogue NLU experts, yielding high-quality annotated data. Finally, we benchmark a series of current state-of-the-art NLU models on NLU++; the results demonstrate the challenging nature of the dataset, especially in low-data regimes, the validity of `intent modularisation', and call for further research on ToD NLU.  ( 2 min )
    Dropout Inference with Non-Uniform Weight Scaling. (arXiv:2204.13047v1 [cs.LG])
    Dropout as regularization has been used extensively to prevent overfitting for training neural networks. During training, units and their connections are randomly dropped, which could be considered as sampling many different submodels from the original model. At test time, weight scaling and Monte Carlo approximation are two widely applied approaches to approximate the outputs. Both approaches work well practically when all submodels are low-bias complex learners. However, in this work, we demonstrate scenarios where some submodels behave closer to high-bias models and a non-uniform weight scaling is a better approximation for inference.  ( 2 min )
    Binding Actions to Objects in World Models. (arXiv:2204.13022v1 [cs.LG])
    We study the problem of binding actions to objects in object-factored world models using action-attention mechanisms. We propose two attention mechanisms for binding actions to objects, soft attention and hard attention, which we evaluate in the context of structured world models for five environments. Our experiments show that hard attention helps contrastively-trained structured world models to learn to separate individual objects in an object-based grid-world environment. Further, we show that soft attention increases performance of factored world models trained on a robotic manipulation task. The learned action attention weights can be used to interpret the factored world model as the attention focuses on the manipulated object in the environment.  ( 2 min )
    Unsupervised Learning of Unbiased Visual Representations. (arXiv:2204.12941v1 [cs.LG])
    Deep neural networks are known for their inability to learn robust representations when biases exist in the dataset. This results in a poor generalization to unbiased datasets, as the predictions strongly rely on peripheral and confounding factors, which are erroneously learned by the network. Many existing works deal with this issue by either employing an explicit supervision on the bias attributes, or assuming prior knowledge about the bias. In this work we study this problem in a more difficult scenario, in which no explicit annotation about the bias is available, and without any prior knowledge about its nature. We propose a fully unsupervised debiasing framework, consisting of three steps: first, we exploit the natural preference for learning malignant biases, obtaining a bias-capturing model; then, we perform a pseudo-labelling step to obtain bias labels; finally we employ state-of-the-art supervised debiasing techniques to obtain an unbiased model. We also propose a theoretical framework to assess the biasness of a model, and provide a detailed analysis on how biases affect the training of neural networks. We perform experiments on synthetic and real-world datasets, showing that our method achieves state-of-the-art performance in a variety of settings, sometimes even higher than fully supervised debiasing approaches.  ( 2 min )
    Learning to Transfer Role Assignment Across Team Sizes. (arXiv:2204.12937v1 [cs.LG])
    Multi-agent reinforcement learning holds the key for solving complex tasks that demand the coordination of learning agents. However, strong coordination often leads to expensive exploration over the exponentially large state-action space. A powerful approach is to decompose team works into roles, which are ideally assigned to agents with the relevant skills. Training agents to adaptively choose and play emerging roles in a team thus allows the team to scale to complex tasks and quickly adapt to changing environments. These promises, however, have not been fully realised by current role-based multi-agent reinforcement learning methods as they assume either a pre-defined role structure or a fixed team size. We propose a framework to learn role assignment and transfer across team sizes. In particular, we train a role assignment network for small teams by demonstration and transfer the network to larger teams, which continue to learn through interaction with the environment. We demonstrate that re-using the role-based credit assignment structure can foster the learning process of larger reinforcement learning teams to achieve tasks requiring different roles. Our proposal outperforms competing techniques in enriched role-enforcing Prey-Predator games and in new scenarios in the StarCraft II Micro-Management benchmark.  ( 2 min )
    Multi-Objective Physics-Guided Recurrent Neural Networks for Identifying Non-Autonomous Dynamical Systems. (arXiv:2204.12972v1 [eess.SY])
    While trade-offs between modeling effort and model accuracy remain a major concern with system identification, resorting to data-driven methods often leads to a complete disregard for physical plausibility. To address this issue, we propose a physics-guided hybrid approach for modeling non-autonomous systems under control. Starting from a traditional physics-based model, this is extended by a recurrent neural network and trained using a sophisticated multi-objective strategy yielding physically plausible models. While purely data-driven methods fail to produce satisfying results, experiments conducted on real data reveal substantial accuracy improvements by our approach compared to a physics-based model.  ( 2 min )
    Domain Knowledge-Infused Deep Learning for Automated Analog/Radio-Frequency Circuit Parameter Optimization. (arXiv:2204.12948v1 [cs.LG])
    The design automation of analog circuits is a longstanding challenge. This paper presents a reinforcement learning method enhanced by graph learning to automate the analog circuit parameter optimization at the pre-layout stage, i.e., finding device parameters to fulfill desired circuit specifications. Unlike all prior methods, our approach is inspired by human experts who rely on domain knowledge of analog circuit design (e.g., circuit topology and couplings between circuit specifications) to tackle the problem. By originally incorporating such key domain knowledge into policy training with a multimodal network, the method best learns the complex relations between circuit parameters and design targets, enabling optimal decisions in the optimization process. Experimental results on exemplary circuits show it achieves human-level design accuracy (99%) 1.5X efficiency of existing best-performing methods. Our method also shows better generalization ability to unseen specifications and optimality in circuit performance optimization. Moreover, it applies to design radio-frequency circuits on emerging semiconductor technologies, breaking the limitations of prior learning methods in designing conventional analog circuits.  ( 2 min )
    Meshless method stencil evaluation with machine learning. (arXiv:2204.12940v1 [cs.LG])
    Meshless methods are an active and modern branch of numerical analysis with many intriguing benefits. One of the main open research questions related to local meshless methods is how to select the best possible stencil - a collection of neighbouring nodes - to base the calculation on. In this paper, we describe the procedure for generating a labelled stencil dataset and use a variation of pointNet - a deep learning network based on point clouds - to create a classifier for the quality of the stencil. We exploit features of pointNet to implement a model that can be used to classify differently sized stencils and compare it against models dedicated to a single stencil size. The model is particularly good at detecting the best and the worst stencils with a respectable area under the curve (AUC) metric of around 0.90. There is much potential for further improvement and direct application in the meshless domain.  ( 2 min )
    GypSum: Learning Hybrid Representations for Code Summarization. (arXiv:2204.12916v1 [cs.SE])
    Code summarization with deep learning has been widely studied in recent years. Current deep learning models for code summarization generally follow the principle in neural machine translation and adopt the encoder-decoder framework, where the encoder learns the semantic representations from source code and the decoder transforms the learnt representations into human-readable text that describes the functionality of code snippets. Despite they achieve the new state-of-the-art performance, we notice that current models often either generate less fluent summaries, or fail to capture the core functionality, since they usually focus on a single type of code representations. As such we propose GypSum, a new deep learning model that learns hybrid representations using graph attention neural networks and a pre-trained programming and natural language model. We introduce particular edges related to the control flow of a code snippet into the abstract syntax tree for graph construction, and design two encoders to learn from the graph and the token sequence of source code, respectively. We modify the encoder-decoder sublayer in the Transformer's decoder to fuse the representations and propose a dual-copy mechanism to facilitate summary generation. Experimental results demonstrate the superior performance of GypSum over existing code summarization models.  ( 2 min )
    A Bayesian Approach To Graph Partitioning. (arXiv:2204.12927v1 [cs.LG])
    A new algorithm based on bayesian inference for learning local graph conductance based on Gaussian Process(GP) is given that uses advanced MCMC convergence ideas to create a scalable and fast algorithm for convergence to stationary distribution which is provided to learn the bahavior of conductance when traversing the indirected weighted graph. First metric embedding is used to represent the vertices of the graph. Then, uniform induced conductance is calculated for training points. Finally, in the learning step, a gaussian process is used to approximate the uniform induced conductance. MCMC is used to measure uncertainty of estimated hyper-parameters.  ( 2 min )
    Using the Projected Belief Network at High Dimensions. (arXiv:2204.12922v1 [cs.LG])
    The projected belief network (PBN) is a layered generative network (LGN) with tractable likelihood function, and is based on a feed-forward neural network (FFNN). There are two versions of the PBN: stochastic and deterministic (D-PBN), and each has theoretical advantages over other LGNs. However, implementation of the PBN requires an iterative algorithm that includes the inversion of a symmetric matrix of size M X M in each layer, where M is the layer output dimension. This, and the fact that the network must be always dimension-reducing in each layer, can limit the types of problems where the PBN can be applied. In this paper, we describe techniques to avoid or mitigate these restrictions and use the PBN effectively at high dimension. We apply the discriminatively aligned PBN (PBN-DA) to classifying and auto-encoding high-dimensional spectrograms of acoustic events. We also present the discriminatively aligned D-PBN for the first time.  ( 2 min )
    Discovering Quantum Phase Transitions with Fermionic Neural Networks. (arXiv:2202.05183v2 [physics.comp-ph] UPDATED)
    Deep neural networks have been extremely successful as highly accurate wave function ans\"atze for variational Monte Carlo calculations of molecular ground states. We present an extension of one such ansatz, FermiNet, to calculations of the ground states of periodic Hamiltonians, and study the homogeneous electron gas. FermiNet calculations of the ground-state energies of small electron gas systems are in excellent agreement with previous initiator full configuration interaction quantum Monte Carlo and diffusion Monte Carlo calculations. We investigate the spin-polarized homogeneous electron gas and demonstrate that the same neural network architecture is capable of accurately representing both the delocalized Fermi liquid state and the localized Wigner crystal state. The network is given no \emph{a priori} knowledge that a phase transition exists, but converges on the translationally invariant ground state at high density and spontaneously breaks the symmetry to produce the crystalline ground state at low density.  ( 2 min )
    Variance-Reduced Heterogeneous Federated Learning via Stratified Client Selection. (arXiv:2201.05762v2 [cs.LG] UPDATED)
    Client selection strategies are widely adopted to handle the communication-efficient problem in recent studies of Federated Learning (FL). However, due to the large variance of the selected subset's update, prior selection approaches with a limited sampling ratio cannot perform well on convergence and accuracy in heterogeneous FL. To address this problem, in this paper, we propose a novel stratified client selection scheme to reduce the variance for the pursuit of better convergence and higher accuracy. Specifically, to mitigate the impact of heterogeneity, we develop stratification based on clients' local data distribution to derive approximate homogeneous strata for better selection in each stratum. Concentrating on a limited sampling ratio scenario, we next present an optimized sample size allocation scheme by considering the diversity of stratum's variability, with the promise of further variance reduction. Theoretically, we elaborate the explicit relation among different selection schemes with regard to variance, under heterogeneous settings, we demonstrate the effectiveness of our selection scheme. Experimental results confirm that our approach not only allows for better performance relative to state-of-the-art methods but also is compatible with prevalent FL algorithms.  ( 2 min )
    Explainable k-means. Don't be greedy, plant bigger trees!. (arXiv:2111.03193v2 [cs.LG] UPDATED)
    We provide a new bi-criteria $\tilde{O}(\log^2 k)$ competitive algorithm for explainable $k$-means clustering. Explainable $k$-means was recently introduced by Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML 2020). It is described by an easy to interpret and understand (threshold) decision tree or diagram. The cost of the explainable $k$-means clustering equals to the sum of costs of its clusters; and the cost of each cluster equals the sum of squared distances from the points in the cluster to the center of that cluster. The best non bi-criteria algorithm for explainable clustering $\tilde{O}(k)$ competitive, and this bound is tight. Our randomized bi-criteria algorithm constructs a threshold decision tree that partitions the data set into $(1+\delta)k$ clusters (where $\delta\in (0,1)$ is a parameter of the algorithm). The cost of this clustering is at most $\tilde{O}(1/ \delta \cdot \log^2 k)$ times the cost of the optimal unconstrained $k$-means clustering. We show that this bound is almost optimal.  ( 2 min )
    Certified Robustness via Randomized Smoothing over Multiplicative Parameters. (arXiv:2106.14432v2 [cs.LG] UPDATED)
    Currently the most popular method of providing robustness certificates is randomized smoothing where an input is smoothed via some probability distribution. We propose a novel approach to randomized smoothing over multiplicative parameters. Using this method we construct certifiably robust classifiers with respect to a gamma correction perturbation and compare the result with classifiers obtained via other smoothing distributions (Gaussian, Laplace, uniform). The experiments show that asymmetrical Rayleigh distribution allows to obtain better certificates for some values of perturbation parameters. To the best of our knowledge it is the first work concerning certified robustness against the multiplicative gamma correction transformation and the first to study effects of asymmetrical distributions in randomized smoothing.  ( 2 min )
    Generalization Bounds with Minimal Dependency on Hypothesis Class via Distributionally Robust Optimization. (arXiv:2106.11180v2 [math.OC] UPDATED)
    Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies generalization bounds that depend solely on the true loss function. To the best of our knowledge, it is the first generalization bound in the literature that is entirely independent of any other candidates in the hypothesis class. We hope our findings can open the door for a better understanding of DRO, especially its benefits on loss minimization and other machine learning applications.  ( 2 min )
    BINAS: Bilinear Interpretable Neural Architecture Search. (arXiv:2110.12399v3 [cs.LG] UPDATED)
    Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints.  ( 2 min )
    Online Deep Learning from Doubly-Streaming Data. (arXiv:2204.11793v2 [cs.LG] UPDATED)
    This paper investigates a new online learning problem with doubly-streaming data, where the data streams are described by feature spaces that constantly evolve, with new features emerging and old features fading away. The challenges of this problem are two folds: 1) Data samples ceaselessly flowing in may carry shifted patterns over time, requiring learners to update hence adapt on-the-fly. 2) Newly emerging features are described by very few samples, resulting in weak learners that tend to make error predictions. A plausible idea to overcome the challenges is to establish relationship between the pre-and-post evolving feature spaces, so that an online learner can leverage the knowledge learned from the old features to better the learning performance on the new features. Unfortunately, this idea does not scale up to high-dimensional media streams with complex feature interplay, which suffers an tradeoff between onlineness (biasing shallow learners) and expressiveness(requiring deep learners). Motivated by this, we propose a novel OLD^3S paradigm, where a shared latent subspace is discovered to summarize information from the old and new feature spaces, building intermediate feature mapping relationship. A key trait of OLD^3S is to treat the model capacity as a learnable semantics, yields optimal model depth and parameters jointly, in accordance with the complexity and non-linearity of the input data streams in an online fashion. Both theoretical analyses and empirical studies substantiate the viability and effectiveness of our proposal.  ( 2 min )
    TERMinator: A Neural Framework for Structure-Based Protein Design using Tertiary Repeating Motifs. (arXiv:2204.13048v1 [q-bio.BM])
    Computational protein design has the potential to deliver novel molecular structures, binders, and catalysts for myriad applications. Recent neural graph-based models that use backbone coordinate-derived features show exceptional performance on native sequence recovery tasks and are promising frameworks for design. A statistical framework for modeling protein sequence landscapes using Tertiary Motifs (TERMs), compact units of recurring structure in proteins, has also demonstrated good performance on protein design tasks. In this work, we investigate the use of TERM-derived data as features in neural protein design frameworks. Our graph-based architecture, TERMinator, incorporates TERM-based and coordinate-based information and outputs a Potts model over sequence space. TERMinator outperforms state-of-the-art models on native sequence recovery tasks, suggesting that utilizing TERM-based and coordinate-based features together is beneficial for protein design.  ( 2 min )
    An Empirical Evaluation of Flow Based Programming in the Machine Learning Deployment Context. (arXiv:2204.12781v1 [cs.SE])
    As use of data driven technologies spreads, software engineers are more often faced with the task of solving a business problem using data-driven methods such as machine learning (ML) algorithms. Deployment of ML within large software systems brings new challenges that are not addressed by standard engineering practices and as a result businesses observe high rate of ML deployment project failures. Data Oriented Architecture (DOA) is an emerging approach that can support data scientists and software developers when addressing such challenges. However, there is a lack of clarity about how DOA systems should be implemented in practice. This paper proposes to consider Flow-Based Programming (FBP) as a paradigm for creating DOA applications. We empirically evaluate FBP in the context of ML deployment on four applications that represent typical data science projects. We use Service Oriented Architecture (SOA) as a baseline for comparison. Evaluation is done with respect to different application domains, ML deployment stages, and code quality metrics. Results reveal that FBP is a suitable paradigm for data collection and data science tasks, and is able to simplify data collection and discovery when compared with SOA. We discuss the advantages of FBP as well as the gaps that need to be addressed to increase FBP adoption as a standard design paradigm for DOA.  ( 2 min )
    MAPLE-Edge: A Runtime Latency Predictor for Edge Devices. (arXiv:2204.12950v1 [cs.LG])
    Neural Architecture Search (NAS) has enabled automatic discovery of more efficient neural network architectures, especially for mobile and embedded vision applications. Although recent research has proposed ways of quickly estimating latency on unseen hardware devices with just a few samples, little focus has been given to the challenges of estimating latency on runtimes using optimized graphs, such as TensorRT and specifically for edge devices. In this work, we propose MAPLE-Edge, an edge device-oriented extension of MAPLE, the state-of-the-art latency predictor for general purpose hardware, where we train a regression network on architecture-latency pairs in conjunction with a hardware-runtime descriptor to effectively estimate latency on a diverse pool of edge devices. Compared to MAPLE, MAPLE-Edge can describe the runtime and target device platform using a much smaller set of CPU performance counters that are widely available on all Linux kernels, while still achieving up to +49.6% accuracy gains against previous state-of-the-art baseline methods on optimized edge device runtimes, using just 10 measurements from an unseen target device. We also demonstrate that unlike MAPLE which performs best when trained on a pool of devices sharing a common runtime, MAPLE-Edge can effectively generalize across runtimes by applying a trick of normalizing performance counters by the operator latency, in the measured hardware-runtime descriptor. Lastly, we show that for runtimes exhibiting lower than desired accuracy, performance can be boosted by collecting additional samples from the target device, with an extra 90 samples translating to gains of nearly +40%.  ( 2 min )
    Uncertainty-Aware Prediction of Battery Energy Consumption for Hybrid Electric Vehicles. (arXiv:2204.12825v1 [cs.LG])
    The usability of vehicles is highly dependent on their energy consumption. In particular, one of the main factors hindering the mass adoption of electric (EV), hybrid (HEV), and plug-in hybrid (PHEV) vehicles is range anxiety, which occurs when a driver is uncertain about the availability of energy for a given trip. To tackle this problem, we propose a machine learning approach for modeling the battery energy consumption. By reducing predictive uncertainty, this method can help increase trust in the vehicle's performance and thus boost its usability. Most related work focuses on physical and/or chemical models of the battery that affect the energy consumption. We propose a data-driven approach which relies on real-world datasets including battery related attributes. Our approach showed an improvement in terms of predictive uncertainty as well as in accuracy compared to traditional methods.  ( 2 min )
    Adaptable Text Matching via Meta-Weight Regulator. (arXiv:2204.12668v1 [cs.IR])
    Neural text matching models have been used in a range of applications such as question answering and natural language inference, and have yielded a good performance. However, these neural models are of a limited adaptability, resulting in a decline in performance when encountering test examples from a different dataset or even a different task. The adaptability is particularly important in the few-shot setting: in many cases, there is only a limited amount of labeled data available for a target dataset or task, while we may have access to a richly labeled source dataset or task. However, adapting a model trained on the abundant source data to a few-shot target dataset or task is challenging. To tackle this challenge, we propose a Meta-Weight Regulator (MWR), which is a meta-learning approach that learns to assign weights to the source examples based on their relevance to the target loss. Specifically, MWR first trains the model on the uniformly weighted source examples, and measures the efficacy of the model on the target examples via a loss function. By iteratively performing a (meta) gradient descent, high-order gradients are propagated to the source examples. These gradients are then used to update the weights of source examples, in a way that is relevant to the target performance. As MWR is model-agnostic, it can be applied to any backbone neural model. Extensive experiments are conducted with various backbone text matching models, on four widely used datasets and two tasks. The results demonstrate that our proposed approach significantly outperforms a number of existing adaptation methods and effectively improves the cross-dataset and cross-task adaptability of the neural text matching models in the few-shot setting.  ( 2 min )
    Topological Data Analysis for Anomaly Detection in Host-Based Logs. (arXiv:2204.12919v1 [cs.LG])
    Topological Data Analysis (TDA) gives practioners the ability to analyse the global structure of cybersecurity data. We use TDA for anomaly detection in host-based logs collected with the open-source Logging Made Easy (LME) project. We present an approach that builds a filtration of simplicial complexes directly from Windows logs, enabling analysis of their intrinsic structure using topological tools. We compare the efficacy of persistent homology and the spectrum of graph and hypergraph Laplacians as feature vectors against a standard log embedding that counts events, and find that topological and spectral embeddings of computer logs contain discriminative information for classifying anomalous logs that is complementary to standard embeddings. We end by discussing the potential for our methods to be used as part of an explainable framework for anomaly detection.  ( 2 min )
    FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming. (arXiv:2204.13103v1 [cs.DC])
    Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging because of the gap between developing efficient accelerators and the rapid creation of new GNN models. Prior art focuses on the acceleration of specific classes of GNNs, such as Graph Convolutional Network (GCN), but lacks the generality to support a wide range of existing or new GNN models. Meanwhile, most work rely on graph pre-processing to exploit data locality, making them unsuitable for real-time applications. To address these limitations, in this work, we propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which can flexibly support the majority of message-passing GNNs. The contributions are three-fold. First, we propose a novel and scalable dataflow architecture, which flexibly supports a wide range of GNN models with message-passing mechanism. The architecture features a configurable dataflow optimized for simultaneous computation of node embedding, edge embedding, and message passing, which is generally applicable to all models. We also propose a rich library of model-specific components. Second, we deliver ultra-fast real-time GNN inference without any graph pre-processing, making it agnostic to dynamically changing graph structures. Third, we verify our architecture on the Xilinx Alveo U50 FPGA board and measure the on-board end-to-end performance. We achieve a speed-up of up to 51-254x against CPU (6226R) and 1.3-477x against GPU (A6000) (with batch sizes 1 through 1024); we also outperform the SOTA GNN accelerator I-GCN by 1.03x and 1.25x across two datasets. Our implementation code and on-board measurement are publicly available on GitHub.  ( 2 min )
    Trainable Compound Activation Functions for Machine Learning. (arXiv:2204.12920v1 [cs.LG])
    Activation functions (AF) are necessary components of neural networks that allow approximation of functions, but AFs in current use are usually simple monotonically increasing functions. In this paper, we propose trainable compound AF (TCA) composed of a sum of shifted and scaled simple AFs. TCAs increase the effectiveness of networks with fewer parameters compared to added layers. TCAs have a special interpretation in generative networks because they effectively estimate the marginal distributions of each dimension of the data using a mixture distribution, reducing modality and making linear dimension reduction more effective. When used in restricted Boltzmann machines (RBMs), they result in a novel type of RBM with mixture-based stochastic units. Improved performance is demonstrated in experiments using RBMs, deep belief networks (DBN), projected belief networks (PBN), and variational auto-encoders (VAE).  ( 2 min )
    Spending Privacy Budget Fairly and Wisely. (arXiv:2204.12903v1 [cs.LG])
    Differentially private (DP) synthetic data generation is a practical method for improving access to data as a means to encourage productive partnerships. One issue inherent to DP is that the "privacy budget" is generally "spent" evenly across features in the data set. This leads to good statistical parity with the real data, but can undervalue the conditional probabilities and marginals that are critical for predictive quality of synthetic data. Further, loss of predictive quality may be non-uniform across the data set, with subsets that correspond to minority groups potentially suffering a higher loss. In this paper, we develop ensemble methods that distribute the privacy budget "wisely" to maximize predictive accuracy of models trained on DP data, and "fairly" to bound potential disparities in accuracy across groups and reduce inequality. Our methods are based on the insights that feature importance can inform how privacy budget is allocated, and, further, that per-group feature importance and fairness-related performance objectives can be incorporated in the allocation. These insights make our methods tunable to social contexts, allowing data owners to produce balanced synthetic data for predictive analysis.  ( 2 min )
    First do no harm: counterfactual objective functions for safe & ethical AI. (arXiv:2204.12993v1 [cs.AI])
    To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. In this paper we develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions. We argue that harm is fundamentally a counterfactual quantity, and show that standard machine learning algorithms are guaranteed to pursue harmful policies in certain environments. To resolve this, we derive a family of counterfactual objective functions that robustly mitigate for harm. We demonstrate our approach with a statistical model for identifying optimal drug doses. While identifying optimal doses using the causal treatment effect results in harmful treatment decisions, our counterfactual algorithm identifies doses that are far less harmful without sacrificing efficacy. Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.  ( 2 min )
    Ollivier-Ricci Curvature For Head Pose Estimation From a Single Image. (arXiv:2204.13006v1 [cs.CV])
    Head pose estimation is a crucial challenge for many real-world applications, such as attention and human behavior analysis. This paper aims to estimate head pose from a single image by applying notions of network curvature. In the real world, many complex networks have groups of nodes that are well connected to each other with significant functional roles. Similarly, the interactions of facial landmarks can be represented as complex dynamic systems modeled by weighted graphs. The functionalities of such systems are therefore intrinsically linked to the topology and geometry of the underlying graph. In this work, using the geometric notion of Ollivier-Ricci curvature (ORC) on weighted graphs as input to the XGBoost regression model, we show that the intrinsic geometric basis of ORC offers a natural approach to discovering underlying common structure within a pool of poses. Experiments on the BIWI, AFLW2000 and Pointing'04 datasets show that the ORC_XGB method performs well compared to state-of-the-art methods, both landmark-based and image-only.
    Scalable particle-based alternatives to EM. (arXiv:2204.12965v1 [stat.CO])
    Building on (Neal and Hinton, 1998), where the problem tackled by EM is recast as the optimization of a free energy functional on an infinite-dimensional space, we obtain three practical particle-based alternatives to EM applicable to broad classes of models. All three are derived through straightforward discretizations of gradient flows associated with the functional. The novel algorithms scale well to high-dimensional settings and outperform existing state-of-the-art methods in numerical experiments.
    Epicardial Adipose Tissue Segmentation from CT Images with A Semi-3D Neural Network. (arXiv:2204.12904v1 [eess.IV])
    Epicardial adipose tissue is a type of adipose tissue located between the heart wall and a protective layer around the heart called the pericardium. The volume and thickness of epicardial adipose tissue are linked to various cardiovascular diseases. It is shown to be an independent cardiovascular disease risk factor. Fully automatic and reliable measurements of epicardial adipose tissue from CT scans could provide better disease risk assessment and enable the processing of large CT image data sets for a systemic epicardial adipose tissue study. This paper proposes a method for fully automatic semantic segmentation of epicardial adipose tissue from CT images using a deep neural network. The proposed network uses a U-Net-based architecture with slice depth information embedded in the input image to segment a pericardium region of interest, which is used to obtain an epicardial adipose tissue segmentation. Image augmentation is used to increase model robustness. Cross-validation of the proposed method yields a Dice score of 0.86 on the CT scans of 20 patients.  ( 2 min )
    On the Dynamics of Inference and Learning. (arXiv:2204.12939v1 [cond-mat.dis-nn])
    Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.  ( 2 min )
    Improving Feature Generalizability with Multitask Learning in Class Incremental Learning. (arXiv:2204.12915v1 [cs.LG])
    Many deep learning applications, like keyword spotting, require the incorporation of new concepts (classes) over time, referred to as Class Incremental Learning (CIL). The major challenge in CIL is catastrophic forgetting, i.e., preserving as much of the old knowledge as possible while learning new tasks. Various techniques, such as regularization, knowledge distillation, and the use of exemplars, have been proposed to resolve this issue. However, prior works primarily focus on the incremental learning step, while ignoring the optimization during the base model training. We hypothesize that a more transferable and generalizable feature representation from the base model would be beneficial to incremental learning. In this work, we adopt multitask learning during base model training to improve the feature generalizability. Specifically, instead of training a single model with all the base classes, we decompose the base classes into multiple subsets and regard each of them as a task. These tasks are trained concurrently and a shared feature extractor is obtained for incremental learning. We evaluate our approach on two datasets under various configurations. The results show that our approach enhances the average incremental learning accuracy by up to 5.5%, which enables more reliable and accurate keyword spotting over time. Moreover, the proposed approach can be combined with many existing techniques and provides additional performance gain.  ( 2 min )
    Forecasting Foreign Exchange Rates With Parameter-Free Regression Networks Tuned By Bayesian Optimization. (arXiv:2204.12914v1 [q-fin.ST])
    The article is concerned with the problem of multi-step financial time series forecasting of Foreign Exchange (FX) rates. To address this problem, we introduce a parameter-free regression network termed RegPred Net. The exchange rate to forecast is treated as a stochastic process. It is assumed to follow a generalization of Brownian motion and the mean-reverting process referred to as the generalized Ornstein-Uhlenbeck (OU) process, with time-dependent coefficients. Using past observed values of the input time series, these coefficients can be regressed online by the cells of the first half of the network (Reg). The regressed coefficients depend only on - but are very sensitive to - a small number of hyperparameters required to be set by a global optimization procedure for which, Bayesian optimization is an adequate heuristic. Thanks to its multi-layered architecture, the second half of the regression network (Pred) can project time-dependent values for the OU process coefficients and generate realistic trajectories of the time series. Predictions can be easily derived in the form of expected values estimated by averaging values obtained by Monte Carlo simulation. The forecasting accuracy on a 100 days horizon is evaluated for several of the most important FX rates such as EUR/USD, EUR/CNY, and EUR/GBP. Our experimental results show that the RegPred Net significantly outperforms ARMA, ARIMA, LSTMs, and Autoencoder-LSTM models in this task.  ( 2 min )
    LiftPool: Lifting-based Graph Pooling for Hierarchical Graph Representation Learning. (arXiv:2204.12881v1 [cs.LG])
    Graph pooling has been increasingly considered for graph neural networks (GNNs) to facilitate hierarchical graph representation learning. Existing graph pooling methods commonly consist of two stages, i.e., selecting the top-ranked nodes and removing the rest nodes to construct a coarsened graph representation. However, local structural information of the removed nodes would be inevitably dropped in these methods, due to the inherent coupling of nodes (location) and their features (signals). In this paper, we propose an enhanced three-stage method via lifting, named LiftPool, to improve hierarchical graph representation by maximally preserving the local structural information in graph pooling. LiftPool introduces an additional stage of graph lifting before graph coarsening to preserve the local information of the removed nodes and decouple the processes of node removing and feature reduction. Specifically, for each node to be removed, its local information is obtained by subtracting the global information aggregated from its neighboring preserved nodes. Subsequently, this local information is aligned and propagated to the preserved nodes to alleviate information loss in graph coarsening. Furthermore, we demonstrate that the proposed LiftPool is localized and permutation-invariant. The proposed graph lifting structure is general to be integrated with existing downsampling-based graph pooling methods. Evaluations on benchmark graph datasets show that LiftPool substantially outperforms the state-of-the-art graph pooling methods in the task of graph classification.  ( 2 min )
    Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study. (arXiv:2204.12868v1 [stat.ML])
    This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.  ( 2 min )
    Detecting Backdoor Poisoning Attacks on Deep Neural Networks by Heatmap Clustering. (arXiv:2204.12848v1 [cs.LG])
    Predicitions made by neural networks can be fraudulently altered by so-called poisoning attacks. A special case are backdoor poisoning attacks. We study suitable detection methods and introduce a new method called Heatmap Clustering. There, we apply a $k$-means clustering algorithm on heatmaps produced by the state-of-the-art explainable AI method Layer-wise relevance propagation. The goal is to separate poisoned from un-poisoned data in the dataset. We compare this method with a similar method, called Activation Clustering, which also uses $k$-means clustering but applies it on the activation of certain hidden layers of the neural network as input. We test the performance of both approaches for standard backdoor poisoning attacks, label-consistent poisoning attacks and label-consistent poisoning attacks with reduced amplitude stickers. We show that Heatmap Clustering consistently performs better than Activation Clustering. However, when considering label-consistent poisoning attacks, the latter method also yields good detection performance.  ( 2 min )
    Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum Learning Study. (arXiv:2204.12844v1 [cs.RO])
    The Reinforcement Learning (RL) paradigm has been an essential tool for automating robotic tasks. Despite the advances in RL, it is still not widely adopted in the industry due to the need for an expensive large amount of robot interaction with its environment. Curriculum Learning (CL) has been proposed to expedite learning. However, most research works have been only evaluated in simulated environments, from video games to robotic toy tasks. This paper presents a study for accelerating robot learning of contact-rich manipulation tasks based on Curriculum Learning combined with Domain Randomization (DR). We tackle complex industrial assembly tasks with position-controlled robots, such as insertion tasks. We compare different curricula designs and sampling approaches for DR. Based on this study, we propose a method that significantly outperforms previous work, which uses DR only (No CL is used), with less than a fifth of the training time (samples). Results also show that even when training only in simulation with toy tasks, our method can learn policies that can be transferred to the real-world robot. The learned policies achieved success rates of up to 86\% on real-world complex industrial insertion tasks (with tolerances of $\pm 0.01~mm$) not seen during the training.  ( 2 min )
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v1 [cs.LG])
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.  ( 2 min )
    Learning to Parallelize in a Shared-Memory Environment with Transformers. (arXiv:2204.12835v1 [cs.DC])
    In past years, the world has switched to many-core and multi-core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes to software applications. OpenMP is the most comprehensive API that implements such schemes, characterized by a readable interface. Nevertheless, introducing OpenMP into code is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in ML techniques, specifically in natural language processing (NLP), to replace S2S compilers altogether. We create a database (corpus), Open-OMP, specifically for this goal. Open-OMP contains over 28,000 code snippets, half of which contain OpenMP directives while the other half do not need parallelization at all with high probability. We use the corpus to train systems to automatically classify code segments in need of parallelization, as well as suggest individual OpenMP clauses. We train several transformer models, named PragFormer, for these tasks, and show that they outperform statistically-trained baselines and automatic S2S parallelization compilers in both classifying the overall need for an OpenMP directive and the introduction of private and reduction clauses. Our source code and database are available at: https://github.com/Scientific-Computing-Lab-NRCN/PragFormer.  ( 2 min )
    Supervised Contrastive CSI Representation Learning for Massive MIMO Positioning. (arXiv:2204.12796v1 [cs.IT])
    Similarity metric is crucial for massive MIMO positioning utilizing channel state information~(CSI). In this letter, we propose a novel massive MIMO CSI similarity learning method via deep convolutional neural network~(DCNN) and contrastive learning. A contrastive loss function is designed considering multiple positive and negative CSI samples drawn from a training dataset. The DCNN encoder is trained using the loss so that positive samples are mapped to points close to the anchor's encoding, while encodings of negative samples are kept away from the anchor's in the representation space. Evaluation results of fingerprint-based positioning on a real-world CSI dataset show that the learned similarity metric improves positioning accuracy significantly compared with other known state-of-the-art methods.  ( 2 min )
    GTNet: A Tree-Based Deep Graph Learning Architecture. (arXiv:2204.12802v1 [cs.LG])
    We propose Graph Tree Networks (GTNets), a deep graph learning architecture with a new general message passing scheme that originates from the tree representation of graphs. In the tree representation, messages propagate upward from the leaf nodes to the root node, and each node preserves its initial information prior to receiving information from its child nodes (neighbors). We formulate a general propagation rule following the nature of message passing in the tree to update a node's feature by aggregating its initial feature and its neighbor nodes' updated features. Two graph representation learning models are proposed within this GTNet architecture - Graph Tree Attention Network (GTAN) and Graph Tree Convolution Network (GTCN), with experimentally demonstrated state-of-the-art performance on several popular benchmark datasets. Unlike the vanilla Graph Attention Network (GAT) and Graph Convolution Network (GCN) which have the "over-smoothing" issue, the proposed GTAN and GTCN models can go deep as demonstrated by comprehensive experiments and rigorous theoretical analysis.  ( 2 min )
    When Performance is not Enough -- A Multidisciplinary View on Clinical Decision Support. (arXiv:2204.12810v1 [cs.LG])
    Scientific publications about machine learning in healthcare are often about implementing novel methods and boosting the performance - at least from a computer science perspective. However, beyond such often short-lived improvements, much more needs to be taken into consideration if we want to arrive at a sustainable progress in healthcare. What does it take to actually implement such a system, make it usable for the domain expert, and possibly bring it into practical usage? Targeted at Computer Scientists, this work presents a multidisciplinary view on machine learning in medical decision support systems and covers information technology, medical, as well as ethical aspects. Along with an implemented risk prediction system in nephrology, challenges and lessons learned in a pilot project are presented.  ( 2 min )
    Learning Green's functions associated with parabolic partial differential equations. (arXiv:2204.12789v1 [math.NA])
    Given input-output pairs from a parabolic partial differential equation (PDE) in any spatial dimension $n\geq 1$, we derive the first theoretically rigorous scheme for learning the associated Green's function $G$. Until now, rigorously learning Green's functions associated with parabolic operators has been a major challenge in the field of scientific machine learning because $G$ may not be square-integrable when $n>1$, and time-dependent PDEs have transient dynamics. By combining the hierarchical low-rank structure of $G$ together with the randomized singular value decomposition, we construct an approximant to $G$ that achieves a relative error of $\smash{\mathcal{O}(\Gamma_\epsilon^{-1/2}\epsilon)}$ in the $L^1$-norm with high probability by using at most $\smash{\mathcal{O}(\epsilon^{-\frac{n+2}{2}}\log(1/\epsilon))}$ input-output training pairs, where $\Gamma_\epsilon$ is a measure of the quality of the training dataset for learning $G$, and $\epsilon>0$ is sufficiently small. Along the way, we extend the low-rank theory of Bebendorf and Hackbusch from elliptic PDEs in dimension $1\leq n\leq 3$ to parabolic PDEs in any dimensions, which shows that Green's functions associated with parabolic PDEs admit a low-rank structure on well-separated domains.  ( 2 min )
    SVD Perspectives for Augmenting DeepONet Flexibility and Interpretability. (arXiv:2204.12670v1 [cs.LG])
    Deep operator networks (DeepONets) are powerful architectures for fast and accurate emulation of complex dynamics. As their remarkable generalization capabilities are primarily enabled by their projection-based attribute, we investigate connections with low-rank techniques derived from the singular value decomposition (SVD). We demonstrate that some of the concepts behind proper orthogonal decomposition (POD)-neural networks can improve DeepONet's design and training phases. These ideas lead us to a methodology extension that we name SVD-DeepONet. Moreover, through multiple SVD analyses, we find that DeepONet inherits from its projection-based attribute strong inefficiencies in representing dynamics characterized by symmetries. Inspired by the work on shifted-POD, we develop flexDeepONet, an architecture enhancement that relies on a pre-transformation network for generating a moving reference frame and isolating the rigid components of the dynamics. In this way, the physics can be represented on a latent space free from rotations, translations, and stretches, and an accurate projection can be performed to a low-dimensional basis. In addition to flexibility and interpretability, the proposed perspectives increase DeepONet's generalization capabilities and computational efficiencies. For instance, we show flexDeepONet can accurately surrogate the dynamics of 19 variables in a combustion chemistry application by relying on 95% less trainable parameters than the ones of the vanilla architecture. We argue that DeepONet and SVD-based methods can reciprocally benefit from each other. In particular, the flexibility of the former in leveraging multiple data sources and multifidelity knowledge in the form of both unstructured data and physics-informed constraints has the potential to greatly extend the applicability of methodologies such as POD and PCA.  ( 2 min )
    Accelerated Continuous-Time Approximate Dynamic Programming via Data-Assisted Hybrid Control. (arXiv:2204.12707v1 [math.OC])
    We introduce a new closed-loop architecture for the online solution of approximate optimal control problems in the context of continuous-time systems. Specifically, we introduce the first algorithm that incorporates dynamic momentum in actor-critic structures to control continuous-time dynamic plants with an affine structure in the input. By incorporating dynamic momentum in our algorithm, we are able to accelerate the convergence properties of the closed-loop system, achieving superior transient performance compared to traditional gradient-descent based techniques. In addition, by leveraging the existence of past recorded data with sufficiently rich information properties, we dispense with the persistence of excitation condition traditionally imposed on the regressors of the critic and the actor. Given that our continuous-time momentum-based dynamics also incorporate periodic discrete-time resets that emulate restarting techniques used in the machine learning literature, we leverage tools from hybrid dynamical systems theory to establish asymptotic stability properties for the closed-loop system. We illustrate our results with a numerical example.  ( 2 min )
    Human-Centered Prior-Guided and Task-Dependent Multi-Task Representation Learning for Action Recognition Pre-Training. (arXiv:2204.12729v1 [cs.CV])
    Recently, much progress has been made for self-supervised action recognition. Most existing approaches emphasize the contrastive relations among videos, including appearance and motion consistency. However, two main issues remain for existing pre-training methods: 1) the learned representation is neutral and not informative for a specific task; 2) multi-task learning-based pre-training sometimes leads to sub-optimal solutions due to inconsistent domains of different tasks. To address the above issues, we propose a novel action recognition pre-training framework, which exploits human-centered prior knowledge that generates more informative representation, and avoids the conflict between multiple tasks by using task-dependent representations. Specifically, we distill knowledge from a human parsing model to enrich the semantic capability of representation. In addition, we combine knowledge distillation with contrastive learning to constitute a task-dependent multi-task framework. We achieve state-of-the-art performance on two popular benchmarks for action recognition task, i.e., UCF101 and HMDB51, verifying the effectiveness of our method.  ( 2 min )
    The Multimarginal Optimal Transport Formulation of Adversarial Multiclass Classification. (arXiv:2204.12676v1 [cs.LG])
    We study a family of adversarial multiclass classification problems and provide equivalent reformulations in terms of: 1) a family of generalized barycenter problems introduced in the paper and 2) a family of multimarginal optimal transport problems where the number of marginals is equal to the number of classes in the original classification problem. These new theoretical results reveal a rich geometric structure of adversarial learning problems in multiclass classification and extend recent results restricted to the binary classification setting. A direct computational implication of our results is that by solving either the barycenter problem and its dual, or the MOT problem and its dual, we can recover the optimal robust classification rule and the optimal adversarial strategy for the original adversarial problem. Examples with synthetic and real data illustrate our results.  ( 2 min )
    A Multi-Head Convolutional Neural Network With Multi-path Attention improves Image Denoising. (arXiv:2204.12736v1 [cs.CV])
    Recently, convolutional neural networks (CNNs) and attention mechanisms have been widely used in image denoising and achieved satisfactory performance. However, the previous works mostly use a single head to receive the noisy image, limiting the richness of extracted features. Therefore, a novel CNN with multiple heads (MH) named MHCNN is proposed in this paper, whose heads will receive the input images rotated by different rotation angles. MH makes MHCNN simultaneously utilize features of rotated images to remove noise. We also present a novel multi-path attention mechanism (MPA) to integrate these features effectively. Unlike previous attention mechanisms that handle pixel-level, channel-level, and patch-level features, MPA focuses on features at the image level. Experiments show MHCNN surpasses other state-of-the-art CNN models on additive white Gaussian noise (AWGN) denoising and real-world image denoising. Its peak signal-to-noise ratio (PSNR) results are higher than other networks, such as DnCNN, BRDNet, RIDNet, PAN-Net, and CSANN. It is also demonstrated that the proposed MH with MPA mechanism can be used as a pluggable component.  ( 2 min )
    DraftRec: Personalized Draft Recommendation for Winning in Multi-Player Online Battle Arena Games. (arXiv:2204.12750v1 [cs.AI])
    This paper presents a personalized character recommendation system for Multiplayer Online Battle Arena (MOBA) games which are considered as one of the most popular online video game genres around the world. When playing MOBA games, players go through a draft stage, where they alternately select a virtual character to play. When drafting, players select characters by not only considering their character preferences, but also the synergy and competence of their team's character combination. However, the complexity of drafting induces difficulties for beginners to choose the appropriate characters based on the characters of their team while considering their own champion preferences. To alleviate this problem, we propose DraftRec, a novel hierarchical model which recommends characters by considering each player's champion preferences and the interaction between the players. DraftRec consists of two networks: the player network and the match network. The player network captures the individual player's champion preference, and the match network integrates the complex relationship between the players and their respective champions. We train and evaluate our model from a manually collected 280,000 matches of League of Legends and a publicly available 50,000 matches of Dota2. Empirically, our method achieved state-of-the-art performance in character recommendation and match outcome prediction task. Furthermore, a comprehensive user survey confirms that DraftRec provides convincing and satisfying recommendations. Our code and dataset are available at https://github.com/dojeon-ai/DraftRec.  ( 2 min )
    Heterogeneous Ensemble Knowledge Transfer for Training Large Models in Federated Learning. (arXiv:2204.12703v1 [cs.LG])
    Federated learning (FL) enables edge-devices to collaboratively learn a model without disclosing their private data to a central aggregating server. Most existing FL algorithms require models of identical architecture to be deployed across the clients and server, making it infeasible to train large models due to clients' limited system resources. In this work, we propose a novel ensemble knowledge transfer method named Fed-ET in which small models (different in architecture) are trained on clients, and used to train a larger model at the server. Unlike in conventional ensemble learning, in FL the ensemble can be trained on clients' highly heterogeneous data. Cognizant of this property, Fed-ET uses a weighted consensus distillation scheme with diversity regularization that efficiently extracts reliable consensus from the ensemble while improving generalization by exploiting the diversity within the ensemble. We show the generalization bound for the ensemble of weighted models trained on heterogeneous datasets that supports the intuition of Fed-ET. Our experiments on image and language tasks show that Fed-ET significantly outperforms other state-of-the-art FL algorithms with fewer communicated parameters, and is also robust against high data-heterogeneity.  ( 2 min )
    Data-based price discrimination: information theoretic limitations and a minimax optimal strategy. (arXiv:2204.12723v1 [cs.GT])
    This paper studies the gap between the classical pricing theory and the data-based pricing theory. We focus on the problem of price discrimination with a continuum of buyer types based on a finite sample of observations. Our first set of results provides sharp lower bounds in the worst-case scenario for the discrepancy between any data-based pricing strategies and the theoretical optimal third-degree price discrimination (3PD) strategy (respectively, uniform pricing strategy) derived from the distribution (where the sample is drawn) ranging over a large class of distributions. Consequently, there is an inevitable gap between revenues based on any data-based pricing strategy and the revenue based on the theoretical optimal 3PD (respectively, uniform pricing) strategy. We then propose easy-to-implement data-based 3PD and uniform pricing strategies and show each strategy is minimax optimal in the sense that the gap between their respective revenue and the revenue based on the theoretical optimal 3PD (respectively, uniform pricing) strategy matches our worst-case lower bounds up to constant factors (that are independent of the sample size $n$). We show that 3PD strategies are revenue superior to uniform pricing strategies if and only if the sample size $n$ is large enough. In other words, if $n$ is below a threshold, uniform pricing strategies are revenue superior to 3PD strategies. We further provide upper bounds for the gaps between the welfare generated by our minimax optimal 3PD (respectively, uniform pricing) strategy and the welfare based on the theoretical optimal 3PD (respectively, uniform pricing) strategy.  ( 2 min )
    Relational Abstractions for Generalized Reinforcement Learning on Symbolic Problems. (arXiv:2204.12665v1 [cs.LG])
    Reinforcement learning in problems with symbolic state spaces is challenging due to the need for reasoning over long horizons. This paper presents a new approach that utilizes relational abstractions in conjunction with deep learning to learn a generalizable Q-function for such problems. The learned Q-function can be efficiently transferred to related problems that have different object names and object quantities, and thus, entirely different state spaces. We show that the learned generalized Q-function can be utilized for zero-shot transfer to related problems without an explicit, hand-coded curriculum. Empirical evaluations on a range of problems show that our method facilitates efficient zero-shot transfer of learned knowledge to much larger problem instances containing many objects.  ( 2 min )
    Machines of finite depth: towards a formalization of neural networks. (arXiv:2204.12786v1 [cs.LG])
    We provide a unifying framework where artificial neural networks and their architectures can be formally described as particular cases of a general mathematical construction--machines of finite depth. Unlike neural networks, machines have a precise definition, from which several properties follow naturally. Machines of finite depth are modular (they can be combined), efficiently computable and differentiable. The backward pass of a machine is again a machine and can be computed without overhead using the same procedure as the forward pass. We prove this statement theoretically and practically, via a unified implementation that generalizes several classical architectures--dense, convolutional, and recurrent neural networks with a rich shortcut structure--and their respective backpropagation rules.  ( 2 min )
    Understanding A Class of Decentralized and Federated Optimization Algorithms: A Multi-Rate Feedback Control Perspective. (arXiv:2204.12663v1 [cs.LG])
    Distributed algorithms have been playing an increasingly important role in many applications such as machine learning, signal processing, and control. Significant research efforts have been devoted to developing and analyzing new algorithms for various applications. In this work, we provide a fresh perspective to understand, analyze, and design distributed optimization algorithms. Through the lens of multi-rate feedback control, we show that a wide class of distributed algorithms, including popular decentralized/federated schemes, can be viewed as discretizing a certain continuous-time feedback control system, possibly with multiple sampling rates, such as decentralized gradient descent, gradient tracking, and federated averaging. This key observation not only allows us to develop a generic framework to analyze the convergence of the entire algorithm class. More importantly, it also leads to an interesting way of designing new distributed algorithms. We develop the theory behind our framework and provide examples to highlight how the framework can be used in practice.  ( 2 min )
    Gaussian Kernel Variance For an Adaptive Learning Method on Signals Over Graphs. (arXiv:2204.12629v1 [eess.SP])
    This paper discusses a special kind of a simple yet possibly powerful algorithm, called single-kernel Gradraker (SKG), which is an adaptive learning method predicting unknown nodal values in a network using known nodal values and the network structure. We aim to find out how to configure the special kind of the model in applying the algorithm. To be more specific, we focus on SKG with a Gaussian kernel and specify how to find a suitable variance for the kernel. To do so, we introduce two variables with which we are able to set up requirements on the variance of the Gaussian kernel to achieve (near-) optimal performance and can better understand how SKG works. Our contribution is that we introduce two variables as analysis tools, illustrate how predictions will be affected under different Gaussian kernels, and provide an algorithm finding a suitable Gaussian kernel for SKG with knowledge about the training network. Simulation results on real datasets are provided.  ( 2 min )
    Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback. (arXiv:2204.12764v1 [cs.LG])
    We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o(T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$-armed bandit and bandit convex optimization, we have $\mathcal{O}(T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in \cite{wang2021adaptive}, showing that non-oblivious delay is enough to incur $\tilde{\Omega}(T^{2/3})$ regret.  ( 2 min )
    SCGC : Self-Supervised Contrastive Graph Clustering. (arXiv:2204.12656v1 [cs.LG])
    Graph clustering discovers groups or communities within networks. Deep learning methods such as autoencoders (AE) extract effective clustering and downstream representations but cannot incorporate rich structural information. While Graph Neural Networks (GNN) have shown great success in encoding graph structure, typical GNNs based on convolution or attention variants suffer from over-smoothing, noise, heterophily, are computationally expensive and typically require the complete graph being present. Instead, we propose Self-Supervised Contrastive Graph Clustering (SCGC), which imposes graph-structure via contrastive loss signals to learn discriminative node representations and iteratively refined soft cluster labels. We also propose SCGC*, with a more effective, novel, Influence Augmented Contrastive (IAC) loss to fuse richer structural information, and half the original model parameters. SCGC(*) is faster with simple linear units, completely eliminate convolutions and attention of traditional GNNs, yet efficiently incorporates structure. It is impervious to layer depth and robust to over-smoothing, incorrect edges and heterophily. It is scalable by batching, a limitation in many prior GNN models, and trivially parallelizable. We obtain significant improvements over state-of-the-art on a wide range of benchmark graph datasets, including images, sensor data, text, and citation networks efficiently. Specifically, 20% on ARI and 18% on NMI for DBLP; overall 55% reduction in training time and overall, 81% reduction on inference time. Our code is available at : https://github.com/gayanku/SCGC  ( 2 min )
    Evaluation of Self-taught Learning-based Representations for Facial Emotion Recognition. (arXiv:2204.12624v1 [cs.CV])
    This work describes different strategies to generate unsupervised representations obtained through the concept of self-taught learning for facial emotion recognition (FER). The idea is to create complementary representations promoting diversity by varying the autoencoders' initialization, architecture, and training data. SVM, Bagging, Random Forest, and a dynamic ensemble selection method are evaluated as final classification methods. Experimental results on Jaffe and Cohn-Kanade datasets using a leave-one-subject-out protocol show that FER methods based on the proposed diverse representations compare favorably against state-of-the-art approaches that also explore unsupervised feature learning.  ( 2 min )
    Meta-Learning Based Early Fault Detection for Rolling Bearings via Few-Shot Anomaly Detection. (arXiv:2204.12637v1 [cs.LG])
    Early fault detection (EFD) of rolling bearings can recognize slight deviation of the health states and contribute to the stability of mechanical systems. In practice, very limited target bearing data are available to conduct EFD, which makes it hard to adapt to the EFD task of new bearings. To address this problem, many transfer learning based EFD methods utilize historical data to learn transferable domain knowledge and conduct early fault detection on new target bearings. However, most existing methods only consider the distribution drift across different working conditions but ignore the difference between bearings under the same working condition, which is called Unit-to-Unit Variability (UtUV). The setting of EFD with limited target data considering UtUV can be formulated as a Few-shot Anomaly Detection task. Therefore, this paper proposes a novel EFD method based on meta-learning considering UtUV. The proposed method can learn a generic metric based on Relation Network (RN) to measure the similarity between normal data and the new arrival target bearing data. Besides, the proposed method utilizes a health state embedding strategy to decrease false alarms. The performance of proposed method is tested on two bearing datasets. The results show that the proposed method can detect incipient faults earlier than the baselines with lower false alarms.  ( 2 min )
    Novel Applications for VAE-based Anomaly Detection Systems. (arXiv:2204.12577v1 [cs.LG])
    The recent rise in deep learning technologies fueled innovation and boosted scientific research. Their achievements enabled new research directions for deep generative modeling (DGM), an increasingly popular approach that can create novel and unseen data, starting from a given data set. As the technology shows promising applications, many ethical issues also arise. For example, their misuse can enable disinformation campaigns and powerful phishing attempts. Research also indicates different biases affect deep learning models, leading to social issues such as misrepresentation. In this work, we formulate a novel setting to deal with similar problems, showing that a repurposed anomaly detection system effectively generates novel data, avoiding generating specified unwanted data. We propose Variational Auto-encoding Binary Classifiers (V-ABC): a novel model that repurposes and extends the Auto-encoding Binary Classifier (ABC) anomaly detector, using the Variational Auto-encoder (VAE). We survey the limitations of existing approaches and explore many tools to show the model's inner workings in an interpretable way. This proposal has excellent potential for generative applications: models that rely on user-generated data could automatically filter out unwanted content, such as offensive language, obscene images, and misleading information.  ( 2 min )
    Self-scalable Tanh (Stan): Faster Convergence and Better Generalization in Physics-informed Neural Networks. (arXiv:2204.12589v1 [cs.LG])
    Physics-informed Neural Networks (PINNs) are gaining attention in the engineering and scientific literature for solving a range of differential equations with applications in weather modeling, healthcare, manufacturing, and so on. Poor scalability is one of the barriers to utilizing PINNs for many real-world problems. To address this, a Self-scalable tanh (Stan) activation function is proposed for the PINNs. The proposed Stan function is smooth, non-saturating, and has a trainable parameter. During training, it can allow easy flow of gradients to compute the required derivatives and also enable systematic scaling of the input-output mapping. It is also shown theoretically that the PINN with the proposed Stan function has no spurious stationary points when using gradient descent algorithms. The proposed Stan is tested on a couple of numerical studies involving general regression problems. It is subsequently used for solving multiple forward problems, which involve second-order derivatives and multiple dimensions, and an inverse problem where the thermal diffusivity is predicted through heat conduction in a rod. Our results of these case studies establish empirically that the Stan activation function can achieve better training and more accurate predictions than the state-of-the-art activation functions.  ( 2 min )
    Rate-Constrained Remote Contextual Bandits. (arXiv:2204.12620v1 [cs.LG])
    We consider a rate-constrained contextual multi-armed bandit (RC-CMAB) problem, in which a group of agents are solving the same contextual multi-armed bandit (CMAB) problem. However, the contexts are observed by a remotely connected entity, i.e., the decision-maker, that updates the policy to maximize the returned rewards, and communicates the arms to be sampled by the agents to a controller over a rate-limited communications channel. This framework can be applied to personalized ad placement, whenever the content owner observes the website visitors, and hence has the context, but needs to transmit the ads to be shown to a controller that is in charge of placing the marketing content. Consequently, the rate-constrained CMAB (RC-CMAB) problem requires the study of lossy compression schemes for the policy to be employed whenever the constraint on the channel rate does not allow the uncompressed transmission of the decision-maker's intentions. We characterize the fundamental information theoretic limits of this problem by letting the number of agents go to infinity, and study the regret that can be achieved, identifying the two distinct rate regions leading to linear and sub-linear regrets respectively. We then analyze the optimal compression scheme achievable in the limit with infinite agents, when using the forward and reverse KL divergence as distortion metric. Based on this, we also propose a practical coding scheme, and provide numerical results.  ( 2 min )
    RAMBO-RL: Robust Adversarial Model-Based Offline Reinforcement Learning. (arXiv:2204.12581v1 [cs.LG])
    Offline reinforcement learning (RL) aims to find near-optimal policies from logged data without further environment interaction. Model-based algorithms, which learn a model of the environment from the dataset and perform conservative policy optimisation within that model, have emerged as a promising approach to this problem. In this work, we present Robust Adversarial Model-Based Offline RL (RAMBO), a novel approach to model-based offline RL. To achieve conservatism, we formulate the problem as a two-player zero sum game against an adversarial environment model. The model is trained minimise the value function while still accurately predicting the transitions in the dataset, forcing the policy to act conservatively in areas not covered by the dataset. To approximately solve the two-player game, we alternate between optimising the policy and optimising the model adversarially. The problem formulation that we address is theoretically grounded, resulting in a PAC performance guarantee and a pessimistic value function which lower bounds the value function in the true environment. We evaluate our approach on widely studied offline RL benchmarks, and demonstrate that our approach achieves state of the art performance.  ( 2 min )
    hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification. (arXiv:2204.12587v1 [cs.MM])
    Social media platforms often act as breeding grounds for various forms of trolling or malicious content targeting users or communities. One way of trolling users is by creating memes, which in most cases unites an image with a short piece of text embedded on top of it. The situation is more complex for multilingual(e.g., Tamil) memes due to the lack of benchmark datasets and models. We explore several models to detect Troll memes in Tamil based on the shared task, "Troll Meme Classification in DravidianLangTech2022" at ACL-2022. We observe while the text-based model MURIL performs better for Non-troll meme classification, the image-based model VGG16 performs better for Troll-meme classification. Further fusing these two modalities help us achieve stable outcomes in both classes. Our fusion model achieved a 0.561 weighted average F1 score and ranked second in this task.  ( 2 min )
    Protein 3D structure-based neural networks highly improve the accuracy in compound-protein binding affinity prediction. (arXiv:2204.12586v1 [q-bio.BM])
    Theoretically, the accuracy of computational models in predicting compound-protein binding affinities (CPAs) could be improved by the introduction of protein 3D structure information. However, most of these models still suffer from a low accuracy due to the lack of an efficient approach to encode informative protein features. The major challenge is how to combine the multi-modal information such as the residue sequence of the protein, residue atom coordinates and the torsion angles. To tackle this problem, we develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting CPAs. Specifically, we established a novel end-to-end architecture to jointly embed torsion matrix, discrete distance matrix, and sequence information of protein and extract compound features with deep graph convolution layers. In addition, a new pairwise mapping attention mechanism is introduced to comprehensively learn potential interaction information between proteins and compounds. FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the Pearson value elevated by about 35.7%. Thus, FeatNN provides an outstanding method for highly accurate CPA prediction and facilitates high-throughput virtual screening of drug candidates.  ( 2 min )
    Learning Eco-Driving Strategies at Signalized Intersections. (arXiv:2204.12561v1 [eess.SY])
    Signalized intersections in arterial roads result in persistent vehicle idling and excess accelerations, contributing to fuel consumption and CO2 emissions. There has thus been a line of work studying eco-driving control strategies to reduce fuel consumption and emission levels at intersections. However, methods to devise effective control strategies across a variety of traffic settings remain elusive. In this paper, we propose a reinforcement learning (RL) approach to learn effective eco-driving control strategies. We analyze the potential impact of a learned strategy on fuel consumption, CO2 emission, and travel time and compare with naturalistic driving and model-based baselines. We further demonstrate the generalizability of the learned policies under mixed traffic scenarios. Simulation results indicate that scenarios with 100% penetration of connected autonomous vehicles (CAV) may yield as high as 18% reduction in fuel consumption and 25% reduction in CO2 emission levels while even improving travel speed by 20%. Furthermore, results indicate that even 25% CAV penetration can bring at least 50% of the total fuel and emission reduction benefits.  ( 2 min )
    SoFaiR: Single Shot Fair Representation Learning. (arXiv:2204.12556v1 [cs.LG])
    To avoid discriminatory uses of their data, organizations can learn to map them into a representation that filters out information related to sensitive attributes. However, all existing methods in fair representation learning generate a fairness-information trade-off. To achieve different points on the fairness-information plane, one must train different models. In this paper, we first demonstrate that fairness-information trade-offs are fully characterized by rate-distortion trade-offs. Then, we use this key result and propose SoFaiR, a single shot fair representation learning method that generates with one trained model many points on the fairness-information plane. Besides its computational saving, our single-shot approach is, to the extent of our knowledge, the first fair representation learning method that explains what information is affected by changes in the fairness / distortion properties of the representation. Empirically, we find on three datasets that SoFaiR achieves similar fairness-information trade-offs as its multi-shot counterparts.  ( 2 min )
    Surrogate Assisted Evolutionary Multi-objective Optimisation applied to a Pressure Swing Adsorption system. (arXiv:2204.12585v1 [cs.NE])
    Chemical plant design and optimisation have proven challenging due to the complexity of these real-world systems. The resulting complexity translates into high computational costs for these systems' mathematical formulations and simulation models. Research has illustrated the benefits of using machine learning surrogate models as substitutes for computationally expensive models during optimisation. This paper extends recent research into optimising chemical plant design and operation. The study further explores Surrogate Assisted Genetic Algorithms (SA-GA) in more complex variants of the original plant design and optimisation problems, such as the inclusion of parallel and feedback components. The novel extension to the original algorithm proposed in this study, Surrogate Assisted NSGA-\Romannum{2} (SA-NSGA), was tested on a popular literature case, the Pressure Swing Adsorption (PSA) system. We further provide extensive experimentation, comparing various meta-heuristic optimisation techniques and numerous machine learning models as surrogates. The results for both sets of systems illustrate the benefits of using Genetic Algorithms as an optimisation framework for complex chemical plant system design and optimisation for both single and multi-objective scenarios. We confirm that Random Forest surrogate assisted Evolutionary Algorithms can be scaled to increasingly complex chemical systems with parallel and feedback components. We further find that combining a Genetic Algorithm framework with Machine Learning Surrogate models as a substitute for long-running simulation models yields significant computational efficiency improvements, 1.7 - 1.84 times speedup for the increased complexity examples and a 2.7 times speedup for the Pressure Swing Adsorption system.  ( 2 min )
    Toward Policy Explanations for Multi-Agent Reinforcement Learning. (arXiv:2204.12568v1 [cs.AI])
    Advances in multi-agent reinforcement learning(MARL) enable sequential decision making for a range of exciting multi-agent applications such as cooperative AI and autonomous driving. Explaining agent decisions are crucial for improving system transparency, increasing user satisfaction, and facilitating human-agent collaboration. However, existing works on explainable reinforcement learning mostly focus on the single-agent setting and are not suitable for addressing challenges posed by multi-agent environments. We present novel methods to generate two types of policy explanations for MARL: (i) policy summarization about the agent cooperation and task sequence, and (ii) language explanations to answer queries about agent behavior. Experimental results on three MARL domains demonstrate the scalability of our methods. A user study shows that the generated explanations significantly improve user performance and increase subjective ratings on metrics such as user satisfaction.  ( 2 min )
    Process Knowledge-infused Learning for Suicidality Assessment on Social Media. (arXiv:2204.12560v1 [cs.AI])
    Improving the performance and natural language explanations of deep learning algorithms is a priority for adoption by humans in the real world. In several domains, such as healthcare, such technology has significant potential to reduce the burden on humans by providing quality assistance at scale. However, current methods rely on the traditional pipeline of predicting labels from data, thus completely ignoring the process and guidelines used to obtain the labels. Furthermore, post hoc explanations on the data to label prediction using explainable AI (XAI) models, while satisfactory to computer scientists, leave much to be desired to the end-users due to lacking explanations of the process in terms of human-understandable concepts. We \textit{introduce}, \textit{formalize}, and \textit{develop} a novel Artificial Intelligence (A) paradigm -- Process Knowledge-infused Learning (PK-iL). PK-iL utilizes a structured process knowledge that explicitly explains the underlying prediction process that makes sense to end-users. The qualitative human evaluation confirms through a annotator agreement of 0.72, that humans are understand explanations for the predictions. PK-iL also performs competitively with the state-of-the-art (SOTA) baselines.  ( 2 min )
    Data Bootstrapping Approaches to Improve Low Resource Abusive Language Detection for Indic Languages. (arXiv:2204.12543v1 [cs.CL])
    Abusive language is a growing concern in many social media platforms. Repeated exposure to abusive speech has created physiological effects on the target users. Thus, the problem of abusive language should be addressed in all forms for online peace and safety. While extensive research exists in abusive speech detection, most studies focus on English. Recently, many smearing incidents have occurred in India, which provoked diverse forms of abusive speech in online space in various languages based on the geographic location. Therefore it is essential to deal with such malicious content. In this paper, to bridge the gap, we demonstrate a large-scale analysis of multilingual abusive speech in Indic languages. We examine different interlingual transfer mechanisms and observe the performance of various multilingual models for abusive speech detection for eight different Indic languages. We also experiment to show how robust these models are on adversarial attacks. Finally, we conduct an in-depth error analysis by looking into the models' misclassified posts across various settings. We have made our code and models public for other researchers.  ( 2 min )
    Double Diffusion Maps and their Latent Harmonics for Scientific Computations in Latent Space. (arXiv:2204.12536v1 [stat.ML])
    We introduce a data-driven approach to building reduced dynamical models through manifold learning; the reduced latent space is discovered using Diffusion Maps (a manifold learning technique) on time series data. A second round of Diffusion Maps on those latent coordinates allows the approximation of the reduced dynamical models. This second round enables mapping the latent space coordinates back to the full ambient space (what is called lifting); it also enables the approximation of full state functions of interest in terms of the reduced coordinates. In our work, we develop and test three different reduced numerical simulation methodologies, either through pre-tabulation in the latent space and integration on the fly or by going back and forth between the ambient space and the latent space. The data-driven latent space simulation results, based on the three different approaches, are validated through (a) the latent space observation of the full simulation through the Nystr\"om Extension formula, or through (b) lifting the reduced trajectory back to the full ambient space, via Latent Harmonics. Latent space modeling often involves additional regularization to favor certain properties of the space over others, and the mapping back to the ambient space is then constructed mostly independently from these properties; here, we use the same data-driven approach to construct the latent space and then map back to the ambient space.  ( 2 min )
    Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production. (arXiv:2204.12526v1 [q-bio.QM])
    In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.  ( 2 min )
    Multi stain graph fusion for multimodal integration in pathology. (arXiv:2204.12541v1 [eess.IV])
    In pathology, tissue samples are assessed using multiple staining techniques to enhance contrast in unique histologic features. In this paper, we introduce a multimodal CNN-GNN based graph fusion approach that leverages complementary information from multiple non-registered histopathology images to predict pathologic scores. We demonstrate this approach in nonalcoholic steatohepatitis (NASH) by predicting CRN fibrosis stage and NAFLD Activity Score (NAS). Primary assessment of NASH typically requires liver biopsy evaluation on two histological stains: Trichrome (TC) and hematoxylin and eosin (H&E). Our multimodal approach learns to extract complementary information from TC and H&E graphs corresponding to each stain while simultaneously learning an optimal policy to combine this information. We report up to 20% improvement in predicting fibrosis stage and NAS component grades over single-stain modeling approaches, measured by computing linearly weighted Cohen's kappa between machine-derived vs. pathologist consensus scores. Broadly, this paper demonstrates the value of leveraging diverse pathology images for improved ML-powered histologic assessment.  ( 2 min )
    Application of WGAN-GP in recommendation and Questioning the relevance of GAN-based approaches. (arXiv:2204.12527v1 [cs.IR])
    Many neural-based recommender systems were proposed in recent years and part of them used Generative Adversarial Networks (GAN) to model user-item interactions. However, the exploration of Wasserstein GAN with Gradient Penalty (WGAN-GP) on recommendation has received relatively less scrutiny. In this paper, we focus on two questions: 1- Can we successfully apply WGAN-GP on recommendation and does this approach give an advantage compared to the best GAN models? 2- Are GAN-based recommender systems relevant? To answer the first question, we propose a recommender system based on WGAN-GP called CFWGAN-GP which is founded on a previous model (CFGAN). We successfully applied our method on real-world datasets on the top-k recommendation task and the empirical results show that it is competitive with state-of-the-art GAN approaches, but we found no evidence of significant advantage of using WGAN-GP instead of the original GAN, at least from the accuracy point of view. As for the second question, we conduct a simple experiment in which we show that a well-tuned conceptually simpler method outperforms GAN-based models by a considerable margin, questioning the use of such models.  ( 2 min )
    Self-Supervised Information Bottleneck for Deep Multi-View Subspace Clustering. (arXiv:2204.12496v1 [cs.LG])
    In this paper, we explore the problem of deep multi-view subspace clustering framework from an information-theoretic point of view. We extend the traditional information bottleneck principle to learn common information among different views in a self-supervised manner, and accordingly establish a new framework called Self-supervised Information Bottleneck based Multi-view Subspace Clustering (SIB-MSC). Inheriting the advantages from information bottleneck, SIB-MSC can learn a latent space for each view to capture common information among the latent representations of different views by removing superfluous information from the view itself while retaining sufficient information for the latent representations of other views. Actually, the latent representation of each view provides a kind of self-supervised signal for training the latent representations of other views. Moreover, SIB-MSC attempts to learn the other latent space for each view to capture the view-specific information by introducing mutual information based regularization terms, so as to further improve the performance of multi-view subspace clustering. To the best of our knowledge, this is the first work to explore information bottleneck for multi-view subspace clustering. Extensive experiments on real-world multi-view data demonstrate that our method achieves superior performance over the related state-of-the-art methods.  ( 2 min )
    Enhancing Privacy against Inversion Attacks in Federated Learning by using Mixing Gradients Strategies. (arXiv:2204.12495v1 [cs.LG])
    Federated learning reduces the risk of information leakage, but remains vulnerable to attacks. We investigate how several neural network design decisions can defend against gradients inversion attacks. We show that overlapping gradients provides numerical resistance to gradient inversion on the highly vulnerable dense layer. Specifically, we propose to leverage batching to maximise mixing of gradients by choosing an appropriate loss function and drawing identical labels. We show that otherwise it is possible to directly recover all vectors in a mini-batch without any numerical optimisation due to the de-mixing nature of the cross entropy loss. To accurately assess data recovery, we introduce an absolute variation distance (AVD) metric for information leakage in images, derived from total variation. In contrast to standard metrics, e.g. Mean Squared Error or Structural Similarity Index, AVD offers a continuous metric for extracting information in noisy images. Finally, our empirical results on information recovery from various inversion attacks and training performance supports our defense strategies. These strategies are also shown to be useful for deep convolutional neural networks such as LeNET for image recognition. We hope that this study will help guide the development of further strategies that achieve a trustful federation policy.  ( 2 min )
    AI-Assisted Authentication: State of the Art, Taxonomy and Future Roadmap. (arXiv:2204.12492v1 [cs.CR])
    Artificial Intelligence (AI) has found its applications in a variety of environments ranging from data science to cybersecurity. AI helps break through the limitations of traditional algorithms and provides more efficient and flexible methods for solving problems. In this paper, we focus on the applications of artificial intelligence in authentication, which is used in a wide range of scenarios including facial recognition to access buildings, keystroke dynamics to unlock smartphones. With the emerging AI-assisted authentication schemes, our comprehensive survey provides an overall understanding on a high level, which paves the way for future research in this area. In contrast to other relevant surveys, our research is the first of its kind to focus on the roles of AI in authentication.  ( 2 min )
    One-shot Federated Learning without Server-side Training. (arXiv:2204.12493v1 [cs.LG])
    Federated Learning (FL) has recently made significant progress as a new machine learning paradigm for privacy protection. Due to the high communication cost of traditional FL, one-shot federated learning is gaining popularity as a way to reduce communication cost between clients and the server. Most of the existing one-shot FL methods are based on Knowledge Distillation; however, distillation based approach requires an extra training phase and depends on publicly available data sets. In this work, we consider a novel and challenging setting: performing a single round of parameter aggregation on the local models without server-side training on a public data set. In this new setting, we propose an effective algorithm for Model Aggregation via Exploring Common Harmonized Optima (MA-Echo), which iteratively updates the parameters of all local models to bring them close to a common low-loss area on the loss surface, without harming performance on their own data sets at the same time. Compared to the existing methods, MA-Echo can work well even in extremely non-identical data distribution settings where the support categories of each local model have no overlapped labels with those of the others. We conduct extensive experiments on two popular image classification data sets to compare the proposed method with existing methods and demonstrate the effectiveness of MA-Echo, which clearly outperforms the state-of-the-arts.  ( 2 min )
  • Open

    BINAS: Bilinear Interpretable Neural Architecture Search. (arXiv:2110.12399v3 [cs.LG] UPDATED)
    Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous methods use complicated predictors for the accuracy of the network. Those predictors are hard to interpret and sensitive to many hyperparameters to be tuned, hence, the resulting accuracy of the generated models is often harmed. In this work we resolve this by introducing Bilinear Interpretable Neural Architecture Search (BINAS), that is based on an accurate and simple bilinear formulation of both an accuracy estimator and the expected resource requirement, together with a scalable search method with theoretical guarantees. The simplicity of our proposed estimator together with the intuitive way it is constructed bring interpretability through many insights about the contribution of different design choices. For example, we find that in the examined search space, adding depth and width is more effective at deeper stages of the network and at the beginning of each resolution stage. Our experiments show that BINAS generates comparable to or better architectures than other state-of-the-art NAS methods within a reduced marginal search cost, while strictly satisfying the resource constraints.  ( 2 min )
    Tensor decomposition for learning Gaussian mixtures from moments. (arXiv:2106.00555v2 [math.AG] UPDATED)
    In data processing and machine learning, an important challenge is to recover and exploit models that can represent accurately the data. We consider the problem of recovering Gaussian mixture models from datasets. We investigate symmetric tensor decomposition methods for tackling this problem, where the tensor is built from empirical moments of the data distribution. We consider identifiable tensors, which have a unique decomposition, showing that moment tensors built from spherical Gaussian mixtures have this property. We prove that symmetric tensors with interpolation degree strictly less than half their order are identifiable and we present an algorithm, based on simple linear algebra operations, to compute their decomposition. Illustrative experimentations show the impact of the tensor decomposition method for recovering Gaussian mixtures, in comparison with other state-of-the-art approaches.  ( 2 min )
    Identification of feasible pathway information for c-di-GMP binding proteins in cellulose production. (arXiv:2204.12526v1 [q-bio.QM])
    In this paper, we utilize a machine learning approach to identify the significant pathways for c-di-GMP signaling proteins. The dataset involves gene counts from 12 pathways and 5 essential c-di-GMP binding domains for 1024 bacterial genomes. Two novel approaches, Least absolute shrinkage and selection operator (Lasso) and Random forests, have been applied for analyzing and modeling the dataset. Both approaches show that bacterial chemotaxis is the most essential pathway for c-di-GMP encoding domains. Though popular for feature selection, the strong regularization of Lasso method fails to associate any pathway to MshE domain. Results from the analysis may help to understand and emphasize the supporting pathways involved in bacterial cellulose production. These findings demonstrate the need for a chassis to restrict the behavior or functionality by deactivating the selective pathways in cellulose production.  ( 2 min )
    Efficient Learning of the Parameters of Non-Linear Models using Differentiable Resampling in Particle Filters. (arXiv:2111.01409v2 [stat.ML] UPDATED)
    It has been widely documented that the sampling and resampling steps in particle filters cannot be differentiated. The {\itshape reparameterisation trick} was introduced to allow the sampling step to be reformulated into a differentiable function. We extend the {\itshape reparameterisation trick} to include the stochastic input to resampling therefore limiting the discontinuities in the gradient calculation after this step. Knowing the gradients of the prior and likelihood allows us to run particle Markov Chain Monte Carlo (p-MCMC) and use the No-U-Turn Sampler (NUTS) as the proposal when estimating parameters. We compare the Metropolis-adjusted Langevin algorithm (MALA), Hamiltonian Monte Carlo with different number of steps and NUTS. We consider two state-space models and show that NUTS improves the mixing of the Markov chain and can produce more accurate results in less computational time.  ( 2 min )
    Knowledge Transfer in Engineering Fleets: Hierarchical Bayesian Modelling for Multi-Task Learning. (arXiv:2204.12404v1 [stat.ML] CROSS LISTED)
    We propose a population-level analysis to address issues of data sparsity when building predictive models of engineering infrastructure. By sharing information between similar assets, hierarchical Bayesian modelling is used to improve the survival analysis of a truck fleet (hazard curves) and power prediction in a wind farm (power curves). In each example, a set of correlated functions are learnt over the asset fleet, in a combined inference, to learn a population model. Parameter estimation is improved when sub-fleets of assets are allowed to share correlated information at different levels in the hierarchy. In turn, groups with incomplete data automatically borrow statistical strength from those that are data-rich. The correlations can be inspected to inform which assets share information for which effect (i.e. parameter).  ( 2 min )
    Compressed sensing of low-rank plus sparse matrices. (arXiv:2007.09457v2 [math.NA] UPDATED)
    Expressing a matrix as the sum of a low-rank matrix plus a sparse matrix is a flexible model capturing global and local features in data popularized as Robust PCA (Candes et al., 2011; Chandrasekaran et al., 2009). Compressed sensing, matrix completion, and their variants (Eldar and Kutyniok, 2012; Foucart and Rauhut, 2013) have established that data satisfying low complexity models can be efficiently measured and recovered from a number of measurements proportional to the model complexity rather than the ambient dimension. This manuscript develops similar guarantees showing that $m\times n$ matrices that can be expressed as the sum of a rank-$r$ matrix and a $s$-sparse matrix can be recovered by computationally tractable methods from $\mathcal{O}(r(m+n-r)+s)\log(mn/s)$ linear measurements. More specifically, we establish that the low-rank plus sparse matrix set is closed provided the incoherence of the low-rank component is upper bounded as $\mu<\sqrt{mn}/(r\sqrt{s})$, and subsequently, the restricted isometry constants for the aforementioned matrices remain bounded independent of problem size provided $p/mn$, $s/p$, and $r(m+n-r)/p$ remain fixed. Additionally, we show that semidefinite programming and two hard threshold gradient descent algorithms, NIHT and NAHT, converge to the measured matrix provided the measurement operator's RIC's are sufficiently small. These results also provably solve convex and non-convex formulation of Robust PCA with the asymptotically optimal fraction of corruptions $\alpha=\mathcal{O}\left(1/(\mu r) \right)$, where $s = \alpha^2 mn$, and improve the previously best known guarantees by not requiring that the fraction of corruptions is spread in every column and row by being upper bounded by $\alpha$. Numerical experiments illustrating these results are shown for synthetic problems, dynamic-foreground/static-background separation, and multispectral imaging.  ( 2 min )
    Differentially Quantized Gradient Methods. (arXiv:2002.02508v4 [cs.LG] UPDATED)
    Consider the following distributed optimization scenario. A worker has access to training data that it uses to compute the gradients while a server decides when to stop iterative computation based on its target accuracy or delay constraints. The server receives all its information about the problem instance from the worker via a rate-limited noiseless communication channel. We introduce the principle we call Differential Quantization (DQ) that prescribes compensating the past quantization errors to direct the descent trajectory of a quantized algorithm towards that of its unquantized counterpart. Assuming that the objective function is smooth and strongly convex, we prove that Differentially Quantized Gradient Descent (DQ-GD) attains a linear contraction factor of $\max\{\sigma_{\mathrm{GD}}, \rho_n 2^{-R}\}$, where $\sigma_{\mathrm{GD}}$ is the contraction factor of unquantized gradient descent (GD), $\rho_n \geq 1$ is the covering efficiency of the quantizer, and $R$ is the bitrate per problem dimension $n$. Thus at any $R\geq\log_2 \rho_n /\sigma_{\mathrm{GD}}$ bits, the contraction factor of DQ-GD is the same as that of unquantized GD, i.e., there is no loss due to quantization. We show that no algorithm within a certain class can converge faster than $\max\{\sigma_{\mathrm{GD}}, 2^{-R}\}$. Since quantizers exist with $\rho_n \to 1$ as $n \to \infty$ (Rogers, 1963), this means that DQ-GD is asymptotically optimal. The principle of differential quantization continues to apply to gradient methods with momentum such as Nesterov's accelerated gradient descent, and Polyak's heavy ball method. For these algorithms as well, if the rate is above a certain threshold, there is no loss in contraction factor obtained by the differentially quantized algorithm compared to its unquantized counterpart. Experimental results on least-squares problems validate our theoretical analysis.  ( 3 min )
    Neural Collapse Inspired Attraction-Repulsion-Balanced Loss for Imbalanced Learning. (arXiv:2204.08735v2 [cs.LG] UPDATED)
    Class imbalance distribution widely exists in real-world engineering. However, the mainstream optimization algorithms that seek to minimize error will trap the deep learning model in sub-optimums when facing extreme class imbalance. It seriously harms the classification precision, especially on the minor classes. The essential reason is that the gradients of the classifier weights are imbalanced among the components from different classes. In this paper, we propose Attraction-Repulsion-Balanced Loss (ARB-Loss) to balance the different components of the gradients. We perform experiments on the large-scale classification and segmentation datasets and our ARB-Loss can achieve state-of-the-art performance via only one-stage training instead of 2-stage learning like nowadays SOTA works.  ( 2 min )
    The Multimarginal Optimal Transport Formulation of Adversarial Multiclass Classification. (arXiv:2204.12676v1 [cs.LG])
    We study a family of adversarial multiclass classification problems and provide equivalent reformulations in terms of: 1) a family of generalized barycenter problems introduced in the paper and 2) a family of multimarginal optimal transport problems where the number of marginals is equal to the number of classes in the original classification problem. These new theoretical results reveal a rich geometric structure of adversarial learning problems in multiclass classification and extend recent results restricted to the binary classification setting. A direct computational implication of our results is that by solving either the barycenter problem and its dual, or the MOT problem and its dual, we can recover the optimal robust classification rule and the optimal adversarial strategy for the original adversarial problem. Examples with synthetic and real data illustrate our results.
    Double Diffusion Maps and their Latent Harmonics for Scientific Computations in Latent Space. (arXiv:2204.12536v1 [stat.ML])
    We introduce a data-driven approach to building reduced dynamical models through manifold learning; the reduced latent space is discovered using Diffusion Maps (a manifold learning technique) on time series data. A second round of Diffusion Maps on those latent coordinates allows the approximation of the reduced dynamical models. This second round enables mapping the latent space coordinates back to the full ambient space (what is called lifting); it also enables the approximation of full state functions of interest in terms of the reduced coordinates. In our work, we develop and test three different reduced numerical simulation methodologies, either through pre-tabulation in the latent space and integration on the fly or by going back and forth between the ambient space and the latent space. The data-driven latent space simulation results, based on the three different approaches, are validated through (a) the latent space observation of the full simulation through the Nystr\"om Extension formula, or through (b) lifting the reduced trajectory back to the full ambient space, via Latent Harmonics. Latent space modeling often involves additional regularization to favor certain properties of the space over others, and the mapping back to the ambient space is then constructed mostly independently from these properties; here, we use the same data-driven approach to construct the latent space and then map back to the ambient space.
    Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback. (arXiv:2204.12764v1 [cs.LG])
    We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into $d$ components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest $d$ rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder non-oblivious setting. We show non-oblivious setting incurs $\Omega(T)$ pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys $o(T)$ policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for $K$-armed bandit and bandit convex optimization, we have $\mathcal{O}(T^{2/3})$ policy regret bound. We also prove a matching lower bound for $K$-armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is non-oblivious. It answers the open problem proposed in \cite{wang2021adaptive}, showing that non-oblivious delay is enough to incur $\tilde{\Omega}(T^{2/3})$ regret.
    Performance and Interpretability Comparisons of Supervised Machine Learning Algorithms: An Empirical Study. (arXiv:2204.12868v1 [stat.ML])
    This paper compares the performances of three supervised machine learning algorithms in terms of predictive ability and model interpretation on structured or tabular data. The algorithms considered were scikit-learn implementations of extreme gradient boosting machines (XGB) and random forests (RFs), and feedforward neural networks (FFNNs) from TensorFlow. The paper is organized in a findings-based manner, with each section providing general conclusions supported by empirical results from simulation studies that cover a wide range of model complexity and correlation structures among predictors. We considered both continuous and binary responses of different sample sizes. Overall, XGB and FFNNs were competitive, with FFNNs showing better performance in smooth models and tree-based boosting algorithms performing better in non-smooth models. This conclusion held generally for predictive performance, identification of important variables, and determining correct input-output relationships as measured by partial dependence plots (PDPs). FFNNs generally had less over-fitting, as measured by the difference in performance between training and testing datasets. However, the difference with XGB was often small. RFs did not perform well in general, confirming the findings in the literature. All models exhibited different degrees of bias seen in PDPs, but the bias was especially problematic for RFs. The extent of the biases varied with correlation among predictors, response type, and data set sample size. In general, tree-based models tended to over-regularize the fitted model in the tails of predictor distributions. Finally, as to be expected, performances were better for continuous responses compared to binary data and with larger samples.
    Transfer Learning with Pre-trained Conditional Generative Models. (arXiv:2204.12833v1 [cs.LG])
    Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods generally assume at least one of (i) source and target task label spaces must overlap, (ii) source datasets are available, and (iii) target network architectures are consistent with source ones. However, these all assumptions are difficult to hold in practical settings because the target task rarely has the same labels as the source task, the source dataset access is restricted due to licensing and storage costs, and the target architecture is often specialized to each task. To transfer source knowledge without these assumptions, we propose a transfer learning method that uses deep generative models and is composed of the following two stages: pseudo pre-training (PP) and pseudo semi-supervised learning (P-SSL). PP trains a target architecture with a synthesized dataset by using conditional source generative models. P-SSL applies SSL algorithms to labeled target data and unlabeled pseudo samples, which are generated by cascading the source classifier and generative models to condition them with target samples. Our experimental results indicate that our method can outperform baselines of scratch training and knowledge distillation.
    An Empirical Study of the Occurrence of Heavy-Tails in Training a ReLU Gate. (arXiv:2204.12554v1 [cs.LG])
    A particular direction of recent advance about stochastic deep-learning algorithms has been about uncovering a rather mysterious heavy-tailed nature of the stationary distribution of these algorithms, even when the data distribution is not so. Moreover, the heavy-tail index is known to show interesting dependence on the input dimension of the net, the mini-batch size and the step size of the algorithm. In this short note, we undertake an experimental study of this index for S.G.D. while training a $\relu$ gate (in the realizable and in the binary classification setup) and for a variant of S.G.D. that was proven in Karmakar and Mukherjee (2022) for ReLU realizable data. From our experiments we conjecture that these two algorithms have similar heavy-tail behaviour on any data where the latter can be proven to converge. Secondly, we demonstrate that the heavy-tail index of the late time iterates in this model scenario has strikingly different properties than either what has been proven for linear hypothesis classes or what has been previously demonstrated for large nets.
    Variational Kalman Filtering with Hinf-Based Correction for Robust Bayesian Learning in High Dimensions. (arXiv:2204.13089v1 [stat.ML])
    In this paper, we address the problem of convergence of sequential variational inference filter (VIF) through the application of a robust variational objective and Hinf-norm based correction for a linear Gaussian system. As the dimension of state or parameter space grows, performing the full Kalman update with the dense covariance matrix for a large scale system requires increased storage and computational complexity, making it impractical. The VIF approach, based on mean-field Gaussian variational inference, reduces this burden through the variational approximation to the covariance usually in the form of a diagonal covariance approximation. The challenge is to retain convergence and correct for biases introduced by the sequential VIF steps. We desire a framework that improves feasibility while still maintaining reasonable proximity to the optimal Kalman filter as data is assimilated. To accomplish this goal, a Hinf-norm based optimization perturbs the VIF covariance matrix to improve robustness. This yields a novel VIF- Hinf recursion that employs consecutive variational inference and Hinf based optimization steps. We explore the development of this method and investigate a numerical example to illustrate the effectiveness of the proposed filter.  ( 2 min )
    Accurate inference of crowdsourcing properties when using efficient allocation strategies. (arXiv:1903.03104v2 [cs.LG] UPDATED)
    Allocation strategies improve the efficiency of crowdsourcing by decreasing the work needed to complete individual tasks accurately. However, these algorithms introduce bias by preferentially allocating workers onto easy tasks, leading to sets of completed tasks that are no longer representative of all tasks. This bias challenges inference of problem-wide properties such as typical task difficulty or crowd properties such as worker completion times, important information that goes beyond the crowd responses themselves. Here we study inference about problem properties when using an allocation algorithm to improve crowd efficiency. We introduce Decision-Explicit Probability Sampling (DEPS), a novel method to perform inference of problem properties while accounting for the potential bias introduced by an allocation strategy. Experiments on real and synthetic crowdsourcing data show that DEPS outperforms baseline inference methods while still leveraging the efficiency gains of the allocation method. The ability to perform accurate inference of general properties when using non-representative data allows crowdsourcers to extract more knowledge out of a given crowdsourced dataset.  ( 2 min )
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v1 [cs.LG])
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 2 min )
    Scalable particle-based alternatives to EM. (arXiv:2204.12965v1 [stat.CO])
    Building on (Neal and Hinton, 1998), where the problem tackled by EM is recast as the optimization of a free energy functional on an infinite-dimensional space, we obtain three practical particle-based alternatives to EM applicable to broad classes of models. All three are derived through straightforward discretizations of gradient flows associated with the functional. The novel algorithms scale well to high-dimensional settings and outperform existing state-of-the-art methods in numerical experiments.  ( 2 min )
    Closing the Gap between Single-User and Multi-User VoiceFilter-Lite. (arXiv:2202.12169v2 [eess.AS] UPDATED)
    VoiceFilter-Lite is a speaker-conditioned voice separation model that plays a crucial role in improving speech recognition and speaker verification by suppressing overlapping speech from non-target speakers. However, one limitation of VoiceFilter-Lite, and other speaker-conditioned speech models in general, is that these models are usually limited to a single target speaker. This is undesirable as most smart home devices now support multiple enrolled users. In order to extend the benefits of personalization to multiple users, we previously developed an attention-based speaker selection mechanism and applied it to VoiceFilter-Lite. However, the original multi-user VoiceFilter-Lite model suffers from significant performance degradation compared with single-user models. In this paper, we devised a series of experiments to improve the multi-user VoiceFilter-Lite model. By incorporating a dual learning rate schedule and by using feature-wise linear modulation (FiLM) to condition the model with the attended speaker embedding, we successfully closed the performance gap between multi-user and single-user VoiceFilter-Lite models on single-speaker evaluations. At the same time, the new model can also be easily extended to support any number of users, and significantly outperforms our previously published model on multi-speaker evaluations.  ( 2 min )
    IH-GAN: A Conditional Generative Model for Implicit Surface-Based Inverse Design of Cellular Structures. (arXiv:2103.02588v4 [cs.CE] UPDATED)
    Variable-density cellular structures can overcome connectivity and manufacturability issues of topologically optimized structures, particularly those represented as discrete density maps. However, the optimization of such cellular structures is challenging due to the multiscale design problem. Past work addressing this problem generally either only optimizes the volume fraction of single-type unit cells but ignores the effects of unit cell geometry on properties, or considers the geometry-property relation but builds this relation via heuristics. In contrast, we propose a simple yet more principled way to accurately model the property to geometry mapping using a conditional deep generative model, named Inverse Homogenization Generative Adversarial Network (IH-GAN). It learns the conditional distribution of unit cell geometries given properties and can realize the one-to-many mapping from properties to geometries. We further reduce the complexity of IH-GAN by using the implicit function parameterization to represent unit cell geometries. Results show that our method can 1) generate various unit cells that satisfy given material properties with high accuracy ($R^2$-scores between target properties and properties of generated unit cells $>98\%$) and 2) improve the optimized structural performance over the conventional variable-density single-type structure. In the minimum compliance example, our IH-GAN generated structure achieves a $79.7\%$ reduction in concentrated stress and an extra $3.03\%$ reduction in displacement. In the target deformation examples, our IH-GAN generated structure reduces the target matching error by $86.4\%$ and $79.6\%$ for two test cases, respectively. We also demonstrated that the connectivity issue for multi-type unit cells can be solved by transition layer blending.  ( 3 min )
    Forecasting Foreign Exchange Rates With Parameter-Free Regression Networks Tuned By Bayesian Optimization. (arXiv:2204.12914v1 [q-fin.ST])
    The article is concerned with the problem of multi-step financial time series forecasting of Foreign Exchange (FX) rates. To address this problem, we introduce a parameter-free regression network termed RegPred Net. The exchange rate to forecast is treated as a stochastic process. It is assumed to follow a generalization of Brownian motion and the mean-reverting process referred to as the generalized Ornstein-Uhlenbeck (OU) process, with time-dependent coefficients. Using past observed values of the input time series, these coefficients can be regressed online by the cells of the first half of the network (Reg). The regressed coefficients depend only on - but are very sensitive to - a small number of hyperparameters required to be set by a global optimization procedure for which, Bayesian optimization is an adequate heuristic. Thanks to its multi-layered architecture, the second half of the regression network (Pred) can project time-dependent values for the OU process coefficients and generate realistic trajectories of the time series. Predictions can be easily derived in the form of expected values estimated by averaging values obtained by Monte Carlo simulation. The forecasting accuracy on a 100 days horizon is evaluated for several of the most important FX rates such as EUR/USD, EUR/CNY, and EUR/GBP. Our experimental results show that the RegPred Net significantly outperforms ARMA, ARIMA, LSTMs, and Autoencoder-LSTM models in this task.  ( 2 min )
    On the Dynamics of Inference and Learning. (arXiv:2204.12939v1 [cond-mat.dis-nn])
    Statistical Inference is the process of determining a probability distribution over the space of parameters of a model given a data set. As more data becomes available this probability distribution becomes updated via the application of Bayes' theorem. We present a treatment of this Bayesian updating process as a continuous dynamical system. Statistical inference is then governed by a first order differential equation describing a trajectory or flow in the information geometry determined by a parametric family of models. We solve this equation for some simple models and show that when the Cram\'{e}r-Rao bound is saturated the learning rate is governed by a simple $1/T$ power-law, with $T$ a time-like variable denoting the quantity of data. The presence of hidden variables can be incorporated in this setting, leading to an additional driving term in the resulting flow equation. We illustrate this with both analytic and numerical examples based on Gaussians and Gaussian Random Processes and inference of the coupling constant in the 1D Ising model. Finally we compare the qualitative behaviour exhibited by Bayesian flows to the training of various neural networks on benchmarked data sets such as MNIST and CIFAR10 and show how that for networks exhibiting small final losses the simple power-law is also satisfied.  ( 2 min )
    First do no harm: counterfactual objective functions for safe & ethical AI. (arXiv:2204.12993v1 [cs.AI])
    To act safely and ethically in the real world, agents must be able to reason about harm and avoid harmful actions. In this paper we develop the first statistical definition of harm and a framework for factoring harm into algorithmic decisions. We argue that harm is fundamentally a counterfactual quantity, and show that standard machine learning algorithms are guaranteed to pursue harmful policies in certain environments. To resolve this, we derive a family of counterfactual objective functions that robustly mitigate for harm. We demonstrate our approach with a statistical model for identifying optimal drug doses. While identifying optimal doses using the causal treatment effect results in harmful treatment decisions, our counterfactual algorithm identifies doses that are far less harmful without sacrificing efficacy. Our results show that counterfactual reasoning is a key ingredient for safe and ethical AI.  ( 2 min )

  • Open

    [P] Auto-encoder image dimension error
    Hello, I have a problem regarding my auto-encoder model. I built it so it can be able to give a MSE error abnormally high when presented an image too different from what it knows. I have this little function at the end of my program to predict whether or not the image presented is an anomaly : ` def check_anomaly(img_path): density_threshold = 2500 #Set this value based on the above exercise reconstruction_error_threshold = 0.004 # Set this value based on the above exercise img = Image.open(img_path) img = np.array(img.resize((128,128), Image.ANTIALIAS)) plt.imshow(img) img = img / 255. img = img[np.newaxis, :,:,:] encoded_img = encoder_model.predict([[img]]) encoded_img = [np.reshape(img, (out_vector_shape)) for img in encoded_img] density = kde.score_samples(encoded_img)[0] reconstruc…  ( 1 min )
    What's the actual difference in simple terms between Cloudera Data Science Workbench, DataRobot, and DataBricks? [Discussion]
    I've also noticed Cloudera and DataRobot seem to follow user pricing where DataBricks follows the standard hourly machine rate. submitted by /u/SmarterChild8675309 [link] [comments]
    [Project] Multi-Bounding box identification
    I have a data set of satellite images of aircraft. For the labels of these images I have the bounding boxes for all aircraft in that image. I looking for some advice on where to get started for this. I need to estimate all bounding boxes for these aircraft, any suggestions? submitted by /u/The_Dov [link] [comments]
    [D] NLP: anyone familiar with taxonomy extraction and evaluation?
    Hi all, does anyone have experience with automatic taxonomy/ontology extraction from unlabelled corpus, especially how to evaluate the extracted structures without gold standards? most published papers would invite students/researchers to conduct manual reviews, thus making it very difficult to compare the results. thanks in advance. submitted by /u/CestLucas [link] [comments]
    [D] Precision-Recall curve with best F1 Score of 0.56
    I am evaluating the performance of a model. I get different Precison-Recall curves for different configurations and the best F1 score (0.56) corresponds to the red point. Would you consider this performance acceptable? I know there is a lot of room for improvement but is it okay or is it extremely poor performance? https://preview.redd.it/k5qihccdb5w81.png?width=784&format=png&auto=webp&s=a0f3408f2ff671ee491faa50752b6c8e0cf17e4c Also, if I understood it correctly, each point of the Precision-Recall curve has its own F1 Score, right? Thank you! submitted by /u/SeaResponsibility176 [link] [comments]  ( 1 min )
    [P] surgeon-pytorch – a small library to inspect intermediate layers of pyTorch models
    I've made surgeon https://github.com/archinetai/surgeon-pytorch, a small library to inspect the intermediate output layers of pyTorch models without changing the original implementation. This can be very useful if you are using pre-trained models (e.g. from Huggingface or torch.hub) and want to get embeddings, attention matrices, or simply debug the model without adding additional code – which is often hard to do without changing the implementation. I hope this can be helpful to anyone! submitted by /u/Aglitter [link] [comments]  ( 1 min )
    [D] Thoughts on the AI4 conference?
    Lately, I have been receiving many messages from the organizers of this conference that they have some free passes to pass on to attend it: https://ai4.io/usa/ While the speaker line-up from the industry appears impressive, I have not heard of this conference before. Any thoughts on how legit/good this conference is? submitted by /u/roalddahl14 [link] [comments]  ( 1 min )
    [D] Has anyone attempted to recreate the work describe in the Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry Paper?
    The D3VO paper from the Tech. University of Munich looks quite promising, however that Uni does not seem very keen on publishing code along with their papers. Has already tried to reproduce their work? What kind of results did you see in your version? What hardware did you run it on? How difficult was it reproduce their models in pytorch? submitted by /u/autojazari [link] [comments]  ( 1 min )
    [D] - Are GFlow Nets considered diffusion models?
    I stumbled upon GFlow Net and in my opinion, it looks very similar to diffusion models. There is a touch of RL in GFlow Net but the main idea is very similar to diffusion models. is that right? or am I missing something? submitted by /u/dimem16 [link] [comments]
    [D] How is it checked if models do not just memorize their training examples?
    Hello everyone, this post is about generative models! (i.e. Score-based-generative models, GANs, etc.) on leaderboards like this https://paperswithcode.com/sota/image-generation-on-cifar-10 How do they check if the models do not just memorize the training examples? The FID score would be optimal in case you would just generate training examples again. Best submitted by /u/future_gcp_poweruser [link] [comments]  ( 5 min )
    [D] ICLR 2022 blog post track
    It seems like the list of accepted blog posts has been published for the ICLR 2022 blog post track (https://iclr-blog-track.github.io/). They also invited Karpathy to publish his most recent blog post on the history and future of convnets. What do you think about the blog posts? Any blog posts that you particularly like or that are definitely worth reading? Any blog posts that actually have interesting contributions and/or that you plan to cite? But maybe more importantly: how do you think this will evolve? They seem to have decided to organise the blog post track again for next year's ICLR already. Do you think these kind of publications could have an impact on how we do science or is this rather a nice extra on top of scientific work? There has only been one post in this subreddit on one of the accepted blog posts (since the announcement). I think there are some nice blog posts in there, so I expected to find some discussions here, but it seems like it is either ignored or not worth discussing. Therefore, I thought it would be interesting to (try to) start a discussion. TL;DR: what do you think of the ICLR blog post track and/or the accepted blog posts? submitted by /u/mr_tsjolder [link] [comments]  ( 1 min )
    [D] MLOps tools for automatic fine tuning of deployed machine learning models
    I'm working on a ML model for data extraction out of documents. The model is trained and deployed into production. For fine tuning the model and improving performance I added the possibility for users to correct the extracted data. A concrete example would be: the model labels a word in the document as "company_name" the user corrects it to "street_name". This correction is then used to fine tune the model. Currently the fine tuning is done manually. That is, if the number of corrections exceeds some threshold I take them and start a new training and evaluate the new model before putting it into production. My question would be: is there an MLOps tool that automates this process? Or should I write one myself? I am aware of the tool seldon core that offers A/B testing for comparing the old model with the new one and putting it into production. But unfortunately it does not offer automatic fine tuning. Or that's what I understood from their website. submitted by /u/alzoubi36 [link] [comments]  ( 1 min )
    [P] Create interactive slides for Machine Learning models from Jupyter Notebook
    Would you like to create an interactive presentation for your ML model directly from Jupyter Notebook? I'm working on an open-source project for converting notebooks into interactive documents. Recently, I've added the option to turn notebooks into interactive slides. You can showcase your Machine Learning model as an interactive presentation. During the presentation, the user can change values and recompute the slide! I've created a demo presentation where I use Random Forest to predict Iris species. The screenshot recording of the presentation: https://github.com/pplonski/ml-model-slides/raw/main/media/slides-from-ml-model.gif The presentation is available online (deployed at Heroku) https://ml-model-presentation.herokuapp.com/ What is more, the presentation code is on GitHub https://github.com/pplonski/ml-model-slides (yes, code is presentation! bye-bye PPT) In case anyone is interested, the framework is called Mercury. submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Feature engineering automation?
    Hello, I’m working as a Data Scientist currently and I realized that most of my time is spent on feature engineering. My general practice is that I create aggregations of data (via sql because of the amount of data that needs to be processed) like sum, mean, avg, std, median, q25, q75. I need to do it on a few dozen features. Also I am calculating these aggregations on different time windows: previous week, previous month, previous 3 month. At the end I end up with hundreds of features and I need to select the ones that make any sense, contain relevant information. Currently I am applying pandas profiling, or sweetviz on this huge dataset and trying to analyze it by eyeballing the results. My main challenge is that this process is highly repetitive and manual. I am wondering if there is any tool out there that could help me automate this process and make some parts reusable? I like having a UI especially for visualizing the data. Am I doing something wrong or is there a tool that I’m clearly not aware of? submitted by /u/sgergely [link] [comments]  ( 4 min )
    [D] How to store/surface predictions along with immutable data in a database api?
    Hi, I am faced with something that I thought was simpe but I have been thinking about it so much that I now very confused. Any suggestions are helpful ! Let's assume the scenario that you have a database of cars. For each car you have a brand, model, colour etc. You have an api for this db which you can call, with a car id, and surface the car data. People rely on this api for fast and up to date car information. You have 1 million cars in the db. Assume now there is a requirement to predict a yes/no if the car has a black front bumper (completely made up requirement). This has to be surfaced to the api users along with the car data. You have built a classifier for this, and it takes some time to surface a prediction, e.g. 2 mins. You select what you think is the best operating point for your classifier. When a new car gets added into the db you can now run your predictor and you get back a probability and depending on the operating point a yes/no. What is the best approach here now ? Do you store the result of this prediction in the database (probability, model_version and boolean result) alongside the rest of the car data ? Less flexible as if you decide to tweak the operating point you will need to recalculate the new boolean values for all cars - but you have consistency wrt to what you have in the db and what you return to your users. If there is a mistake you can readily fix it by altering the boolean field. Do you just store the probabilities and decide on the yes/no using the model_version & operating point only when you surface the data to the end user ? More flexible if you decide to tweak the opearting point of the classifier. or would you have a completely different approach ? Would you change anything, if for some cars you have the ground truth data of they have a black bumper or not in the db already ? Thanks for reading ! submitted by /u/42isthenumber_ [link] [comments]  ( 2 min )
  • Open

    [2202.12742] Learning Relative Return Policies With Upside-Down Reinforcement Learning
    submitted by /u/chimp73 [link] [comments]
    What does it mean to centralise the observation in MARL?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Off-policy algorithm with batch of actions
    Hello everyone, first of all sorry for my poor English. I am fairly new to Deep RL, and RL in general. I want to implement custom environment with multiple robots in it. Each robot would be given same task to do, so what I am trying to do is simulate parallelism. My question is: if i have 100 robots in simulation, do i have to instantiate 100 agents (neural networks) to control those robots or single network with batch size of 100 will suffice? Does batch of observations in anyway disturbs agent action (network) output? It would be much more memory efficient with single neural network. So far, I've seen that in process of acting in env agent takes observation of batch size 1 and outputs corresponding action for that observation. Since each observation from the batch is propagated trough the network without calculating gradient it should not be affected by the batch size. If anyone could explain to me if my way of thinking is wrong, and why. :) submitted by /u/Dexter_fixxor [link] [comments]  ( 1 min )
    "NeuPL: Neural Population Learning", Liu et al 2022 (encoding PBT agents into a single multi-policy agent)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    How do Actor-Critic networks reduce the variance compared to other PG method like Reinforce ?
    i understand about why REINFORCE has high variance but how does AC mitigates it ? submitted by /u/aabra__ka__daabra [link] [comments]  ( 1 min )
    on-policy vs off-policy
    I'm looking to find the concrete explanation and difference of on-policy and off-policy learning strategy if possible mathematics can alao be explained. submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    Open position at ZF Friedrichshafen AG- Algorithmenentwickler AI & ML
    Feel free to apply! Algorithmenentwickler-AI-&-Machine-Learning-Motion-Planning submitted by /u/gab_ma [link] [comments]
    Internships/Thesis in the field of AI @ZF
    We are currently offering Internships & Theses in the Field of Artificial Intelligence. Here are the links to our open positions. Feel free to share the post! ✌🏽 Mandatory Internship Software Development in the field of Artificial Intelligence Pflichtpraktikant Reinforcement Learning Algorithmen Pflichtpraktikum/ Masterarbeit: Reinforcement Learning submitted by /u/gab_ma [link] [comments]  ( 1 min )
    General questions for those of you up to date on the topic.
    What does the future of deep reinforcement learning look like? I feel like people were pretty hyped about it a year or two ago. Is it heavily researched now and expected to be used more in the future? What are some real world tasks that it can help with? I've seen it used a lot for playing games, self driving cars, and manufacturing robotics. Anything else that it can be applied to? Will it likely be used for autonomous robots when they become more common? What are some areas that can be improved upon? What are the most recent advancements and what is a good topic to focus on for future research? Thank you. submitted by /u/johnGettings [link] [comments]  ( 2 min )
  • Open

    When will humorous AIs press our buttons with their jokes? | Psyche Ideas
    submitted by /u/estasfuera [link] [comments]
    Best AI for blog post creation? Or what tools do you use to get an outline faster?
    I've only every used writesonic.com I got premium acct access for free. I love it for when u need to quickly spin an article if I'm on a backlink building spree. Haven't tested against others but overall good, and improvements we being made on it all the time (in active development). submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Showcase your ML model in a python Web GUI
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    John Deere is becoming one of the world's most important AI companies
    submitted by /u/estasfuera [link] [comments]  ( 1 min )
    I keep getting terrible advice from AI assistants. Help me rate the worst and the best responses :)
    I was playing around with several AI models and this sort of stuff happens all the time. https://preview.redd.it/6arq56nih3w81.png?width=1163&format=png&auto=webp&s=5ca68dce977ae671b0b928c2f1ec3cdd8478257f I decided to prepare a compilation and rank the answers. Can you help me with deciding how bad or good are some of the answers by taking a survey here? Edit: I apologize if some of you felt tricked into completing the survey by the original version of the post. It does have about 50 questions but they are mostly just yes/no, good/bad, so it shouldn't take longer than 10 minutes. I intend to write a piece about how AI assistants are doing and prepare a compilation of AI fails. I will share the results :) submitted by /u/KazRainer [link] [comments]  ( 1 min )
    Is there a dataset for personal items?
    Hi! Im looking for a dataset containing images of personal items (Wallet, keys, phone etc), annotated by bounding boxes. Cant seem to find anything, do anyone know of such a dataset? Thanks in advance! submitted by /u/ifinty [link] [comments]  ( 1 min )
    Can I get AI language bots, who've already been trained?
    I'm brand new to programming. I am thinking I need a bot to skim through pdf's and through trial and error, I can take the pdf's and make the bot come up with suggestions to scripts or books.However I need to use an AI which already has training in English grammatical and sentence structure and langue/information flow. not nescceecarily interpretation meaning or philosophy or other higher levels of language proficiency. Say I wanted to write a novel about sailors. I'd let it skim some 100 novels on sailing, and generate inputs. then I'd match it up against a separate set of Ai to fact check each other. or even with form blogs from some sailing subreddits or news feeds. I could do this with any topic. medicine, engineering, biology, drama novels etc. Are there any free/opensource bots who already know the English language and might read books and make suggestions based on guided inputs? submitted by /u/International__ [link] [comments]  ( 3 min )
    Edit Images Using Sketches! NVIDIA EditGAN Explained. Control any feature from quick drafts
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Train Test Split in Way Too Much Depth
    submitted by /u/mgalarny [link] [comments]
    AI can't help blind people like me with everything... But it certainly can help me find my clothes
    submitted by /u/thisisjoshtseng [link] [comments]  ( 1 min )
    guess its still soon to ask bots about anything feeling related
    ​ https://preview.redd.it/jbvwyd7u41w81.png?width=393&format=png&auto=webp&s=6cbd5568a84c4d96c03ca9a99bbed73b1576a1d9 any bots that can answer better? submitted by /u/MalwareLord [link] [comments]
    How do you “get your own AI bot?”
    I hear about people who trained or fed an AI bot to say, write a Biden Speech, or create poetry. I want to create a bot that pulls important numbers from my local community to post to social media — like lowest gas prices in town, weather average, mortgage interest rates and a couple other things. submitted by /u/No-Setting2541 [link] [comments]  ( 1 min )
    Artificial Nightmares: Disney Princess Ella || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Thinking of starting a personal project to filter profanity, wanted to air out my plan for recommendations before starting
    So, I'm not super comfortable with swearing, but there are a few YouTubers I follow that are as vulgar as they are hilarious. My plan, as it stands, is to use python in concert with some sort of generic voice to text software to get a list of timestamps for any F words, and then use that list of timestamps to generate an FFMPEG command to cut the audio at each timestamp and remerge into a new video. Does anyone have any recommendations for audio processing packages or better approaches? I'm keen to use anything with hardware acceleration (because making a script to abuse my shiny new graphics card is half my motivation). P.S. I haven't touched python much since graduating, can't wait to dig my claws back in. Although... if anyone has an audio processing package in C# that would be hecking cool too Have a nice day y'all submitted by /u/The_Real_Slim_Lemon [link] [comments]  ( 1 min )
  • Open

    Machine learning, harnessed to extreme computing, aids fusion energy development
    Linking techniques from machine learning with advanced numerical simulations, MIT researchers take an important step in state-of-the-art predictions for fusion plasmas.  ( 6 min )
  • Open

    MoLeR: Creating a path to more efficient drug design
    Drug discovery has come a long way from its roots in serendipity. It is now an increasingly rational process, in which one important phase, called lead optimization, is the stepwise search for promising drug candidate compounds in the lab. In this phase, expert medicinal chemists work to improve “hit” molecules—compounds that demonstrate some promising properties, […] The post MoLeR: Creating a path to more efficient drug design appeared first on Microsoft Research.  ( 6 min )
  • Open

    Answers Blowin’ in the Wind: HPC Code Gives Renewable Energy a Lift
    A hundred and forty turbines in the North Sea — and some GPUs in the cloud — pumped wind under the wings of David Standingford and Jamil Appa’s dream. As colleagues at a British aerospace firm, they shared a vision of starting a company to apply their expertise in high performance computing across many industries. Read article > The post Answers Blowin’ in the Wind: HPC Code Gives Renewable Energy a Lift appeared first on NVIDIA Blog.  ( 4 min )
    What Is Conversational AI? ZeroShot Bot CEO Jason Mars Explains
    Entrepreneur Jason Mars calls conversation our “first technology.” Before humans invented the wheel, crafted a spear or tamed fire, we mastered the superpower of talking to one another. That makes conversation an incredibly important tool. But if you’ve dealt with the automated chatbots deployed by the customer service arms of just about any big organization Read article > The post What Is Conversational AI? ZeroShot Bot CEO Jason Mars Explains appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    TinyML, an underrated field of Machine Learning
    TinyML is a groundbreaking technology! Possessing a lot of potential it is sure to grow exponentially in the coming years.  ( 3 min )
  • Open

    Calculating where projective lines intersect
    A couple days ago I wrote about homogeneous coordinates projective planes. I said that the lines y = 5 and y = 6 intersect in a point “at infinity.” In projective geometry any two distinct lines intersect in exactly one point, and you can compute that intersection point the same way, whether the intersection is […] Calculating where projective lines intersect first appeared on John D. Cook.  ( 4 min )

  • Open

    [D] Transformer-Models-from-Scratch
    I recently started learning machine learning, and I have implemented several transformer models for different tasks from scratch in PyTorch in my Github repository: Transformer-Models-from-Scratch The notebooks are self-contained. And I also included a note I wrote on transformers. Hope it's helpful for anyone learning the transformer model! Let me know if you have any comments! submitted by /u/hbchen-one [link] [comments]
    [P] Curated List of Company Blogs about MLops/ Infra
    Hi all. I am starting a github repo to compile a list of company blogs about their MLops/ infra. Please feel free to contribute if you are interested: https://github.com/enochkan/awesome-ml-stack submitted by /u/kanxx030 [link] [comments]
    [P] TorToiSe - a true zero-shot multi-voice TTS engine
    I'd like to show off a TTS system I have been working on for the past year. I've open-sourced all the code and the trained model weights: https://github.com/neonbjb/tortoise-tts This was born out of a desire to reproduce the original DALLE with speech. It is "zero-shot" because you feed the text and examples of a voice to mimic as prompts to an autoregressive LLM. I think the results are fantastic. Here are some samples: https://nonint.com/static/tortoise_v2_examples.html Here is a colab in which you can try out the whole system: https://colab.research.google.com/drive/1wVVqUPqwiDBUVeWWOUNglpGhU3hg_cbR submitted by /u/neonbjb [link] [comments]  ( 1 min )
    [P] Baseten – Build ML-powered applications
    Hey, we've been building Baseten to be able quickly deploy models, backends and frontends. I'd love to get your feedback. submitted by /u/Available-Cookie2754 [link] [comments]
    [N] Upcoming talk on Data centric approach to AI from experience at Youtube, ScaleAI and Apple
    Hey folks, There’s an upcoming free talk on May 13. This is what I know: Vijay K, Head of Engineering at Scale.ai, and Mike Wu, Stanford PhD in Machine Learning are going to be talking about strategies for taking a data centric approach to AI, and Vijay’s lessons from doing this at Apple, YouTube, and Scale AI. There’s a a renewed focus on the data layer as a foundation for successful ML projects, and Vijay participated in this transformation firsthand. You’ll be able to hear his reflections and learnings, should be super useful! Sign-up link is here, see you there. submitted by /u/sb2nov [link] [comments]  ( 1 min )
    [News] Adept AI Labs Launches
    This seems like a pretty powerhouse team adept.ai/post/introducing-adept submitted by /u/mrpogiface [link] [comments]
    [D] Is it possible to train a very large 6B model learn from a single training input 1 2 3 to infer 6 7 8 (9) ?
    Hi, I was wondering how long it would take or if it would be possible for a model with 6 billion neurons to become able to predict that for 6 7 8 the next number should be 9, given only a single training example 1 2 3. Let's say we use an architecture like the ones used in GPT. Essentially, I am trying to make sense of something here: is the human design component in DL the weak link in the AGI chain? Much like we could not achieve a true AI by manually coding each rule with if's and else, perhaps we cannot achieve true intelligence by manually designing the networks. Can we grow a model organically so it immediately gets this right from a single training example, and continue using the same network to keep on learning from rich input and making new observations. For guidance, the network can use what it already knows, learning outward. A pre-programmed inherent ability to find motifs, for example with STUMPY and Time Series Analysis, allows it to make relationship observations, and is how the agent immediately guesses that a numeric pattern increments by 1 from a single training example. Putting this into DL, perhaps we can achieve something less organic but still good using a model-agnostic architecture with multi-resolution 'tiles' of neurons. Some property of the information would theoretically allow deciding if the tile should be upgraded to a higher resolution (more neurons), and a side buffer keeps track of the connections between these tiles and tries to move the topology forward, adding some small clusters, removing others, attempting to connect them, etc. submitted by /u/o_snake-monster_o_o_ [link] [comments]  ( 2 min )
    [D] Understanding the use of EMA in Diffusion models
    Reading the original diffusion models paper and the improved diffusion model by openAI, I noticed they are using EMA (exponential moving average) to update the parameters of the models. so I started looking at the code openAI published for their version of the diffusion models, and when looking at the code, I see that the model during the training process has its params stored in a variable called "master_params" and then they create a deep copy of the params and call them ema_params. when looking at the "optimize_normal" method, I see that they update the model params using AdamW and gradient descent, and then after that, they update the ema param variable using the EMA equation, so that means the actual model params do a full gradient descent step to the reach minimum of the loss function, and then they do a pseudo step from the original parameters before the optimizer and making them closer to the params after the optimizer. but then looking at the rest of the code, all I see is that they just save a checkpoint of the ema params to the disk but never update the model params using them or anything. so my question is, what is the EMA for if it is not used during training and the model is fully updated using "classical" machine learning optimization with gradient descent? only at inference time do they load the EMA params to generate images, instead of the regular params that were updated using the AdamW? submitted by /u/eyalmazuz [link] [comments]  ( 2 min )
    [P] Benchmarking and profiling Hugging Face training with Graphsignal
    We've recently added Hugging Face support to https://github.com/graphsignal/graphsignal profiler, which I'd like to share in case someone finds it useful in their efforts to optimize speed and compute. More details, code and screenshots in the blog post https://graphsignal.com/blog/benchmarking-and-profiling-hugging-face-training-with-graphsignal/. submitted by /u/l0g1cs [link] [comments]
    [N] MIT/Meta AI released their new SOTA unsupervised sentence embedding model "DiffCSE"
    Researchers from MIT/Meta recently released a new framework for unsupervised sentence embedding. The performance seems to be better than SimCSE, the previous SOTA, by 2.3 absolute points on downstream tasks. The pretrained models are available on Huggingface. GitHub: https://github.com/voidism/DiffCSE arXiv: https://arxiv.org/abs/2204.10298 submitted by /u/virtualenv [link] [comments]  ( 1 min )
    [D] What do you think of the double standard where an AI learning from copyrighted material is “stealing”, but when a human does it that’s just education?
    Some useful optional considerations: Assume copies of the original copyrighted work are owned lawfully. Assume the AI maintains a single active copy to avoid group performance. The implications this has on banning possibly uploading your own consciousness to save your life on copyright grounds. This “stealing” concept people jump to seems to bring up some interesting logical contradictions. What do you think? submitted by /u/sext-scientist [link] [comments]  ( 2 min )
    [News] New Jupyter Notebook competition
    Are you passionate about coding, data science or Earth observation? https://preview.redd.it/wfwk9ifo4vv81.png?width=1920&format=png&auto=webp&s=5f11885bd8efe88986c181b565cc160534634f0b We're looking for bright-minded people from around the world to showcase their skills and develop new Jupyter Notebooks using Copernicus data! Sound interesting? Find out more here: https://www.eumetsat.int/science-blog/new-jupyter-notebook-competition submitted by /u/EUMETSAT [link] [comments]
    [P] K-Prototypes and evaluating model performance and drift
    I have reached a blocking point in a current project. We are using K-Prototypes to segment two populations (one with 100k elements and the other with 1M). In order to evaluate the clustering, during our K-Means stage we used silhouettes which are already implemented in sklearn. For the second stage, this became a problem. Either it was a time issue or a memory issue. So, for the 100k dataset, finding the silhouette profile took around 50 hours using a custom distance metric function. While this is feasible in the project's scope, for the 1M dataset the computation time would be highly impractical. As such, at the moment, we are stuck without a proper evaluation metric for our model. Sure, we can run Davies-Bouldin or other similar metrics, but the silhouette profile gave us much more detailed information. We are also moving the project to databricks. At first, the pyspark clustering evaluator had me hopeful, but it has very limited options regarding distance metrics. This is also an issue because the model is to be deployed in production and should have some metric informing when it needs to be retrained. While this point is still fluid, this is the preferred course of action. Has anyone faced similar issues using K-Prototypes? Or just silhouette profiles with custom distance metrics? submitted by /u/CaptMartelo [link] [comments]  ( 1 min )
    [P] We cleaned up Pascal and improved mAP by 13%
    How important is clean data for how your AI models perform? According to our experiments - very important. Using state-of-the-art confidence learning to clean up PASCAL, two people improved our primary model metric by 13% in a week. To learn more about our results and what we did check out our article: https://hasty.ai/content-hub/articles/cleaning-pascal-improving-map-by-13?utm_source=mk832ksa Disclaimer: We used our own platform to clean up the data and the article, therefore, contains self-promotion. However, the article mainly focuses on the results we achieved. submitted by /u/treebeard_hasty_ai [link] [comments]  ( 3 min )
    [R][P] Using language models for molecule captioning and text-based molecule generation
    Hi. We recently did some work on using language models for molecule captioning and text-based molecule generation. You can think of it as doing translation between molecules and natural language. Would love to know if you have any feedback 🤗. Arxiv: https://arxiv.org/abs/2204.11817 submitted by /u/SimilarShape9122 [link] [comments]  ( 1 min )
    [D] Making text-to-image even better - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    “Diffusion models beat GANs”. While true, the statement comes with several ifs and buts, not to say that the math behind diffusion models is not for the faint of heart. Alas, GLIDE, an OpenAI paper from last December took a big step towards making it true in every sense. Specifically, it introduced a new guidance method for diffusion models that produces higher quality images than even DALL-E, which uses expensive CLIP reranking. And if that wasn’t impressive enough, GLIDE models can be fine-tuned for various downstream tasks such a inpainting and and text-based editing. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/289 Blog post: https://www.casualganpapers.com/faster-diffusion-models-text-to-image-classifier-free-guidance/GLIDE-explained.html GLIDE arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    submitted by /u/aidev2040 [link] [comments]
    Exploring Neural Networks Visually in the Browser
    submitted by /u/nickb [link] [comments]
    What has priority in the performance?
    I'm really new to neural networks and I was wondering how to speed up the process of selecting the best one when I do the training. What I mean is, among the training/validation/test quotas, the number of hidden layers, the type of activation functions, the size of each layers, how do I iterate to find the optimal combination without having to try every little mix? Is there a way to rank the impact of these 4 components on the mse and iterate one at a time to select each aspect? Thanks in advance submitted by /u/beppegrosso97 [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    submitted by /u/aidev2040 [link] [comments]
    Yuval Noah Harari: "One of the things many people don't realize about the AI revolution and the automation revolution: They imagine it as some kind of a one-time event ... This is an extremely unlikely scenario, because we are nowhere near the maximum potential of AI." (3-min. clip)
    submitted by /u/frog9913 [link] [comments]  ( 1 min )
    Dreamy Trippy AI generated Video! VQGAN CliP Rife-RealESRGAN upscale...
    submitted by /u/LordPewPew777 [link] [comments]
    AI Dream 33 - Battestar Galactica Nebula Explosion
    submitted by /u/LordPewPew777 [link] [comments]
    US: Cisco and Verizon collaborated on a successful proof of concept demo in Las Vegas 'meet the latency thresholds required for autonomous driving applications – replacing the costly roadside radios previously required to meet those needs.'
    submitted by /u/dannylenwinn [link] [comments]  ( 1 min )
    Last Week in AI: AI uses in government surveillance, AI teaches human drivers, actors union opposes AI actors, and more!
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 44 - Epic Cathedral Supernatural Visit
    submitted by /u/LordPewPew777 [link] [comments]
    Resources
    Do you know reliable sites where to learn artificial intelligence? (i'm studying computer engineering) but i already want to study something in advance submitted by /u/oraudev [link] [comments]  ( 1 min )
    The Future of Apps: Intelligence
    https://blog.r2c.io/the-future-of-apps-intelligence/ submitted by /u/R2Consulting [link] [comments]
    We cleaned up Pascal and improved mAP by 13%
    How important is clean data for how your AI models perform? According to our experiments - very important. Using state-of-the-art confidence learning to clean up PASCAL, two people improved our primary model metric by 13% in a week. To learn more about our results and what we did check out our article: https://hasty.ai/content-hub/articles/cleaning-pascal-improving-map-by-13?utm_source=da39a3ee Disclaimer: We used our own platform to clean up the data and the article, therefore, contains self-promotion. However, the article mainly focuses on the results we achieved. submitted by /u/treebeard_hasty_ai [link] [comments]  ( 1 min )
    Interested in cognitive science? "Joscha Bach Bits" is a new YouTube channel dedicated to the renowned cognitive scientist Joscha Bach...
    As featured on the Lex Fridman podcast, the Singularity weblog podcast and the Future of Life Institute podcast The channel features shorts of Joscha's opinions and perspectives edited from podcasts. You can check out the trailer, which mostly consists of podcast hosts' minds imploding. All channel videos Channel playlists Channel creator: /u/24karate Enjoy 🤖 submitted by /u/tasinet [link] [comments]  ( 1 min )
    Machine learning's abiding weakness is verification
    submitted by /u/koavf [link] [comments]
    Artificial Nightmares: Monsters Inc || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    A.I. Is Mastering Language. Should We Trust What It Says? • OpenAI’s GPT-3 and other neural nets can now write original prose with mind-boggling fluency — a development that could have profound implications for the future.
    submitted by /u/Naurgul [link] [comments]  ( 1 min )
    Making text-to-image even better - GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models, a 5-minute paper summary by Casual GAN Papers
    “Diffusion models beat GANs”. While true, the statement comes with several ifs and buts, not to say that the math behind diffusion models is not for the faint of heart. Alas, GLIDE, an OpenAI paper from last December took a big step towards making it true in every sense. Specifically, it introduced a new guidance method for diffusion models that produces higher quality images than even DALL-E, which uses expensive CLIP reranking. And if that wasn’t impressive enough, GLIDE models can be fine-tuned for various downstream tasks such a inpainting and and text-based editing. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/289 Blog post: https://www.casualganpapers.com/faster-diffusion-models-text-to-image-classifier-free-guidance/GLIDE-explained.html GLIDE arxiv / code Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Build and deploy a scalable machine learning system on Kubernetes with Kubeflow on AWS
    In this post, we demonstrate Kubeflow on AWS (an AWS-specific distribution of Kubeflow) and the value it adds over open-source Kubeflow through the integration of highly optimized, cloud-native, enterprise-ready AWS services. Kubeflow is the open-source machine learning (ML) platform dedicated to making deployments of ML workflows on Kubernetes simple, portable and scalable. Kubeflow provides many […]  ( 15 min )
    Create random and stratified samples of data with Amazon SageMaker Data Wrangler
    In this post, we walk you through two sampling techniques in Amazon SageMaker Data Wrangler so you can quickly create processing workflows for your data. We cover both random sampling and stratified sampling techniques to help you sample your data based on your specific requirements. Data Wrangler reduces the time it takes to aggregate and […]  ( 7 min )
    Part 4: How NatWest Group migrated ML models to Amazon SageMaker architectures
    The adoption of AWS cloud technology at NatWest Group means moving our machine learning (ML) workloads to a more robust and scalable solution, while reducing our time-to-live to deliver the best products and services for our customers. In this cloud adoption journey, we selected the Customer Lifetime Value (CLV) model to migrate to AWS. The […]  ( 12 min )
    Part 3: How NatWest Group built auditable, reproducible, and explainable ML models with Amazon SageMaker
    This is the third post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS Professional Services to build a new machine learning operations (MLOps) platform. This post is intended for data scientists, MLOps engineers, and data engineers who are interested in building ML pipeline templates with Amazon SageMaker. […]  ( 8 min )
    Part 2: How NatWest Group built a secure, compliant, self-service MLOps platform using AWS Service Catalog and Amazon SageMaker
    This is the second post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS Professional Services to build a new machine learning operations (MLOps) platform. In this post, we share how the NatWest Group utilized AWS to enable the self-service deployment of their standardized, secure, and compliant MLOps […]  ( 12 min )
    Part 1: How NatWest Group built a scalable, secure, and sustainable MLOps platform
    This is the first post of a four-part series detailing how NatWest Group, a major financial services institution, partnered with AWS to build a scalable, secure, and sustainable machine learning operations (MLOps) platform. This initial post provides an overview of the AWS and NatWest Group joint team implemented Amazon SageMaker Studio as the standard for […]  ( 11 min )
    Accelerate data preparation with data quality and insights in Amazon SageMaker Data Wrangler
    Amazon SageMaker Data Wrangler is a new capability of Amazon SageMaker that helps data scientists and data engineers quickly and easily prepare data for machine learning (ML) applications using a visual interface. It contains over 300 built-in data transformations so you can quickly normalize, transform, and combine features without having to write any code. Today, […]  ( 7 min )
  • Open

    Reward after each action vs reward after taking all actions in an episodic environment of N steps
    Hello, I am working on compression of deep neural networks using reinforcement learning. There is one agent that learns to compress convolutional layers using C actions and another one that compresses dense layers using D actions. If there are two conv layers and 3 dense layers, 5 actions have to be selected in a sequence using both agents in order to fully compress the model. I read the paper AdaDeep and found it really useful for my research, but I don't get why the authors select all actions and they only calculate the reward after completely compressing the network instead of getting the reward after each action. In their place I would select the action, calculate the reward of that action and store it in the replay. By only using immediate reward, the agent should be able to learn which sequence of actions would work the best for the current model. Why assign the same reward to the selected actions for each layer? Is it only because the outcome was due to the combination of actions and they want to speed up training? If my understanding is correct, assigning the immediate reward to each action would yield the same results in the long run. Thanks in advance. submitted by /u/ElvishChampion [link] [comments]  ( 1 min )
    Multi Agent RL: agents act in different frequencies?
    After reading D, Multi's post, I'm wondering is it possible that two different agents to take action in their own action space & their own frequency? submitted by /u/YMXin1999 [link] [comments]
    I’m going to build a game where the goal will be making many deliveries using the shortest route. How might the environment best be represented to the agent?
    The environment makes a lot of sense to me, as a human. It’s a network composed of nodes and edges. Nodes are the points in the network, and edges are the lines connecting them. The entire thing resembles a real street network and uses coordinate points to position itself on a graph. Of the entire map, select nodes will represent gas stations, and other select nodes will represent stop locations for deliveries. The agent will start at a node and map a route to each location, essentially by stringing together an array of connected edges. It’ll also need to travel to gas stations every X miles or it’ll run out of gas. Now, I’ve never done this before so I’m gonna bounce some of my ideas off a wall here Passing the entire thing to an agent and having it render a graph and whatnot to determ…  ( 2 min )
    What is a train step counter?
    In this repository that I'm looking at, there's an input variable whose meaning I don't understand. At line 135: https://github.com/google-research/google-research/blob/722494ce68130a7409bf94500002c79014905d53/social_rl/multiagent_tfagents/multiagent_ppo.py#L135 train_step_counter: An optional counter to increment every time the train op is run. Defaults to the global_step. What is train_step_counter? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    exponential weighted average
    hey guys I guess this can be trivial for the most of you but I can't get around it, how do I prove this is an exponential weighted average. Thanks The equation is labeled (10) in the link https://chowdera.com/2021/12/202112200806190809.html from reinforcement learning: an introduction by Sutton and barto , tracking a non stationary problem in chapter 2 submitted by /u/ma7modbasha [link] [comments]  ( 1 min )
  • Open

    Data quality: What and why is it important?
    With the internet producing quintillions of readily available information per day, you could be forgiven to think that data is losing its value. Apparently, data is one of those weird commodities that go up in value the more they are available, or perhaps we haven’t produced enough to attain the demand-supply equilibrium. Virtually all companies… Read More »Data quality: What and why is it important? The post Data quality: What and why is it important? appeared first on Data Science Central.  ( 3 min )
    2 Ways in Which Automatic Data Labeling Saves Time and Costs
    Data scientists face a problem: machine learning models need to be trained on labeled datasets, but labeling the data is tedious and time-consuming. Enter automatic data labeling, in which most of the preprocessing work is done by a computer.  At first glance, automatic data labeling sounds too good to be true. Of course, more automation… Read More »2 Ways in Which Automatic Data Labeling Saves Time and Costs The post 2 Ways in Which Automatic Data Labeling Saves Time and Costs appeared first on Data Science Central.  ( 4 min )
    4 Successful Integrated Marketing Communications Examples
    Integrated Marketing Communications (IMC) is an effective communication process that is intended to strengthen the relationship between the customer and the company while enhancing the company’s sales. IMC utilizes a combination of traditional and new approaches in marketing. IMC uses the channel that is most effective to reach the customer. This blog will look at… Read More »4 Successful Integrated Marketing Communications Examples The post 4 Successful Integrated Marketing Communications Examples appeared first on Data Science Central.  ( 4 min )
    No, AI won’t replace astronauts – and here’s why
    A new book predicts artificial intelligence will soon replace astronauts. The authors posit that robots are cheaper, more reliable, and better suited to space travel. But with the human desire for exploration, AI is unlikely to replace astronauts fully. AI will close the gap with human capabilities in the next few decades and surpass them… Read More »No, AI won’t replace astronauts – and here’s why The post No, AI won’t replace astronauts – and here’s why appeared first on Data Science Central.  ( 4 min )
    Smart Factory- Building Future with 5G
    The implementation of digital technologies blurs the line between the physical and digital world. It has become clear that there is a strong need for digital transformation to achieve the next level of efficiency, connectivity, and flexibility needed in manufacturing to weather modern-day disruptions, risks, and fluctuating demands. 5G SMART believes that 5G will be… Read More »Smart Factory- Building Future with 5G The post Smart Factory- Building Future with 5G appeared first on Data Science Central.  ( 2 min )
    Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes
    Stakeholder Journey Maps are a fabulous tool to intimately understand what a stakeholder is trying to accomplish (their objectives and intentions) and the steps/actions/decisions that stakeholder needs to make to complete their journey. Stakeholder Journey Maps are commonly used to help designers to create the optimal user interface and nicely segue into UI storyboards and… Read More »Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes The post Using Stakeholder Journey Maps to Re-invent, not Just Optimize, Your Business Processes appeared first on Data Science Central.  ( 5 min )
    An analysis of Digital Twin Applications across industries
    Background Digital Twins are virtual representations of physical objects, and they can be connected with their physical counterparts. Through this connection, Digital Twins contribute to the convergence of the real and the virtual world. While the Digital twin’s concept is focused on the manufacturing industry, the paper “Dimensions of Digital Twin Applications – A Literature… Read More »An analysis of Digital Twin Applications across industries The post An analysis of Digital Twin Applications across industries appeared first on Data Science Central.  ( 6 min )
    Make Sure Your Online Data Science Courses Teach These 6 Core Skills
    Data science is a wide field with many specializations, and an individual can have a great career with a data science degree. However, curriculums vary between schools, and the specific data science classes taught in one school may not be taught in another. There are several core skills in the data science field that recruiters… Read More »Make Sure Your Online Data Science Courses Teach These 6 Core Skills The post Make Sure Your Online Data Science Courses Teach These 6 Core Skills appeared first on Data Science Central.  ( 3 min )
    What Personal Knowledge Graphs Have to Do with Business
    I help lead a working group focused on personal knowledge graphs (PKGs). Lately, it’s functioned as a discussion and demo evaluation group for new technologies and how they might be used in a knowledge graph context.   Different individuals want to annotate different kinds of data. Some do a lot of research. For them, the need is… Read More »What Personal Knowledge Graphs Have to Do with Business The post What Personal Knowledge Graphs Have to Do with Business appeared first on Data Science Central.  ( 4 min )
    Healthcare App Development: Why You Should Opt for React
    The world of healthcare has consistently evolved, yes, but the fact remains it has gone through tremendous change ever since the coronavirus pandemic started, thus driving the need for modern solutions to meet the increasingly varying needs of patients. In this context, mobile apps have proven to be the leading tool that has driven focus… Read More »Healthcare App Development: Why You Should Opt for React The post Healthcare App Development: Why You Should Opt for React appeared first on Data Science Central.  ( 3 min )
    Why Agile Often Fails and What to Do When It Happens
    Agile, Agile 2 and Agility, Part III In the previous articles in this series, we discussed the role that agile digital delivery capabilities plays in your company’s competitiveness and why rapid delivery is so important.  This article will look at the many reasons that Agile adoptions frequently fail to deliver what companies expect and suggest… Read More »Why Agile Often Fails and What to Do When It Happens The post Why Agile Often Fails and What to Do When It Happens appeared first on Data Science Central.  ( 7 min )
    How to Protect Your Computer Data
    When it comes to protecting your computer, you can do a few basic things. First, create separate user accounts for work and personal data. Make sure to back up your data and use a firewall. Also, make sure to encrypt it.You should make sure to back up any important documents or photos you may have… Read More »How to Protect Your Computer Data The post How to Protect Your Computer Data appeared first on Data Science Central.  ( 4 min )
  • Open

    Misconceptions about AI, Robotics, and Machine Learning
    Yes and no. So, if you asked this question, good one! When I was new to this stuff, I had the same question and searched up a lot about it.  ( 3 min )
  • Open

    In the NVIDIA Studio: April Driver Launches Alongside New NVIDIA Studio Laptops and Featured 3D Artist
    This week In the NVIDIA Studio, we’re launching the April NVIDIA Studio Driver with optimizations for the most popular 3D apps, including Unreal Engine 5, Cinema4D and Chaos Vantage. The driver also supports new NVIDIA Omniverse Connectors from Blender and Redshift. The post In the NVIDIA Studio: April Driver Launches Alongside New NVIDIA Studio Laptops and Featured 3D Artist appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    A smarter way to develop new drugs
    A new artificial intelligence technique only proposes candidate molecules that can actually be produced in a lab.  ( 6 min )
  • Open

    Random Blaschke products and Mathematica binding
    A Blaschke product is a function that is the product of Blaschke factors, functions of the form b(z; a) = |a|  (a – z) / a (1 – a*z) where the complex number a lies inside the unit circle and a* is the complex conjugate of a. I wanted to plot Blaschke products with random […] Random Blaschke products and Mathematica binding first appeared on John D. Cook.  ( 2 min )

  • Open

    [D] Hypothetically, what's the value in being able to label ~500 million images a day?
    I'm not going to make this vague. Specifically I'm seeing a lot of comments related to Elon Musk saying he's going to remove bots from Twitter. There's a lot of speculation on how this could be done with comments suggesting a captcha-based system for every post/reply (or maybe every action?). More specifically people seem fixated on captcha systems that can't be botted. (Ignore for a moment the accessibility issues and audio fallbacks that might be required). I'm aware that Tesla transitioned to using more massive synthetic datasets for training, so this might be somewhat outdated. That said they do have a lot of data collection of new real world data from their sensors. This has me curious on the estimated value of large-scale captcha systems directly tied with a company that might need large-scale labeling services. I'm sure researchers here have run numbers on labeling services and how to best to utilize them. (Goes without saying the prices vary quite a bit and cover a wide range of tasks from simple bounding boxes to relatively expensive polygons for semantic labeling. For example just as a reference: https://cloud.google.com/ai-platform/data-labeling/pricing ). Most services don't list prices for like 500 million tasks which makes sense given that's a lot. A captcha system would have independent users redoing tasks multiple times to find baselines, but in general it would average to find a ground truth label. This redundant work isn't wasted in this sense as it can find say difficult scenarios and refine labels. I could naively say 500 million / (10 USD/1000 tasks) = 5 million USD a day not counting server costs and development. Not super scientific though and it seems high. (I have doubts on if gathering 500 million samples of value a day to even get labeled is realistic). I digress, what would you say is the hypothetical value of such a system using short captcha tasks? submitted by /u/Sirisian [link] [comments]  ( 2 min )
    [N] Modular Reasoning, Knowledge, & Language (MRKL) Hybrid System For More 'General' NLP
    AI21 Labs’ Modular Reasoning, Knowledge and Language (MRKL, pronounced “miracle”) system – and Jurassic-X includes one or more language models, and augment them with external knowledge sources as well as symbolic reasoning experts that can handle tasks that lie beyond the reach of neural models. There are 55 different task-specific modules that MRKL currently supports. If the router is unsure which module is best, it calls on Jurassic-1. Jurassic also helps compose the contextual language around MRKL’s response. This allows MRKL to give factual answers with up-to-date information instead of being limited to its training data alone, and gives it the ability to carry out a much wider range of NLP tasks as compared to other LLM's like Google's PaLM or OpenAI's GPT-3 AI21 blog and whitepaper here Video here submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] Opinions on NVIDIA TAO Toolkit?
    I'm working on an Edge ML product where we train models in the cloud and then run them on a device using tensorRT. We're considering switching to using the Nvidia TAO Toolkit for training. If you've used TAO, do you like it? Is it limiting? Our alternative is training in pytorch and then converting to ONNX and then tensorRT separately. Thanks! submitted by /u/linguistBot [link] [comments]  ( 1 min )
    [R][P] An arxiv-sanity-like view of ICLR 2022 papers
    submitted by /u/tanelai [link] [comments]  ( 1 min )
    [D] Paper Explained - ACCEL: Evolving Curricula with Regret-Based Environment Design (Video Walkthrough)
    https://youtu.be/povBDxUn1VQ Automatic curriculum generation is one of the most promising avenues for Reinforcement Learning today. Multiple approaches have been proposed, each with their own set of advantages and drawbacks. This paper presents ACCEL, which takes the next step into the direction of constructing curricula for multi-capable agents. ACCEL combines the adversarial adaptiveness of regret-based sampling methods with the capabilities of level-editing, usually found in Evolutionary Methods. ​ OUTLINE: 0:00 - Intro & Demonstration 3:50 - Paper overview 5:20 - The ACCEL algorithm 15:25 - Looking at the pseudocode 23:10 - Approximating regret 33:45 - Experimental results 40:00 - Discussion & Comments ​ Website: https://accelagent.github.io Paper: https://arxiv.org/abs/2203.01302 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [D] Parameter Efficiency without Computational Efficiency
    Hello hivemind. Context I am working on streamlined design strategies for (manual) neural architecture design. I recently came across a simple, receptive field-based strategy that allows me to reliably improve the architectures of some SOTA models like EfficientNet by up +1.5% top1-accuracy.That being said, there seems to be a trade-off between performance and number of computations I put into the same number of parameters.Basically, the more computationally expensive the architectural change is, the better the performance turns out to be. The number of parameters does not change in the process. The model becomes therefore more parameter efficient but computationally less efficient, which you could consider a somewhat Pyrrhic victory. Now to my question: Is there use for parameter efficiency in models when it does not also coincide with computational efficiency? Is there any literature on the topic you can recommend? submitted by /u/KrakenInAJar [link] [comments]  ( 1 min )
    [D] Calculating feature importance: which model/training set to use in the case of cross validation?
    So far I've only seen feature importance calculated on one model/ training set at a time, However, with cross validation there are many models built on different training sets, how do you calculate feature importance in this case? Just do it on one model? Or calculate feature importance for each model and somehow aggregate it (e.g., by averaging)? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 2 min )
    [R] Recent Trends In Diffusion-Based Text-Conditional Image Synthesis
    Hello, I wrote the blog post about text-conditional image generation using diffusion models (including DALLE-2). Let me know what you think! ​ https://sangyun884.github.io/recent-trends-in-diffusion-based-text-conditional/ submitted by /u/Impressive-Mirror430 [link] [comments]
    [N][P] Use GitHub Actions for ML with DagsHub Connect
    Hey r/MachineLearning, Nir from DagsHub here, and I'm thrilled to share a project we're launching today that will hopefully unlock GitHub Actions for ML. GitHub Actions solved a DevOps burden many of us felt by providing an easy-to-configure CI/CD tool to build, test, and deploy pipelines. However, when it comes to ML pipelines, and working with data, models, and experimentation in mind, the workflow is not as well defined and can get tricky to implement. DagsHub is kind of like GitHub for machine learning, which extends what GitHub did for code management to track data, models, experiments, and data pipelines. We do this by integrating awesome open source tools like DVC, MLflow, Label Studio, and more. One of our most requested features was a deeper integration with GitHub that will let…  ( 1 min )
    [R] ICLR 2022 Blog Post: The 37 Implementation Details of Proximal Policy Optimization
    Hi folks, our ICLR 2022 Blog post on "The 37 Implementation Details of Proximal Policy Optimization" is live 😀 Our post makes it easier to understand the nitty-gritty PPO's implementations with 1) 🎥 video tutorials 2) 📜 detailed references and explanations 3) ⌨️ really simple code Here are the links: Official URL: https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Twitter thread: https://twitter.com/vwxyzjn/status/1518589115163369472 OpenReview link:https://openreview.net/forum?id=Hl6jCqIp2j GitHub repo: https://github.com/vwxyzjn/ppo-implementation-details YouTube tutorial on PPO: https://www.youtube.com/playlist?list=PLD80i8An1OEHhcxclwq8jOMam0m0M9dQ_ I am the main author & feel free to ask me anything here. submitted by /u/vwxyzjn [link] [comments]  ( 1 min )
    [N] Learn how the open-source ecosystem can be used in your machine learning and data science classes.
    Hey, 🤗 Hugging Face is offering a workshop (June 6) for instructors of machine learning and data science who would like to learn how the open-source ecosystem can be used in their classes. After this workshop, you will know how to: 🧑‍💻 Teach Transformers models & famous ML libraries 🤖 Onboard students to the Hub to build/host projects 💾 Publish models/datasets in a few lines of code During the workshop, you will be invited to join the following page for a better understanding of our open-source solutions: https://huggingface.co/teach For more details about the workshop content, visit: https://hf.co/teaching Feel free to register here:) submitted by /u/VioletteLep [link] [comments]  ( 1 min )
    [D] Does anyone know a large varied image dataset that do NOT contain humans?
    It could be pictures of anything, except of humans. But it would be better if it were not focused on a single topic like dog images. submitted by /u/TheManveru [link] [comments]  ( 1 min )
    [R] A new dataset and a library that you can use for ML and RL over the Web
    TL;DR: Download dataset of labelled Web pages, WebTraversalLibrary for scripting web interactions Hi everyone! Our group at Klarna has been putting in a ton of work into deep learning for the Web over the past few years and we've made a couple of useful resources available for the research community. You might find them interesting if you're looking for new ideas for spare-time or even post-grad research projects. We've open-sourced a dataset of about 50k labeled product web pages from roughly 8000 distinct e-commerce merchants, available as MHTML and WebTraversalLibrary clones (see next point :) ), along with the corresponding screenshots. Not all of the MHTMLs render correctly, but the ones that do also have screenshots in a corresponding dataset for CV applications. You can find doc…  ( 2 min )
    [D] Is anyone working on open-sourcing Dall-E 2?
    Just like Eleuther did with GPT3? submitted by /u/invertedpassion [link] [comments]  ( 2 min )
    [P] Demo of Google's new PHORUM Image➠3D Figure Project
    submitted by /u/NichodonARG [link] [comments]
  • Open

    Chatbots
    Does anyone know where chatbots specifically the ones on Chai get their information? This one knew what meta-analysis was and knew who the main character of a show was I do not believe the bots are real people based on both the information I've seen and my experience with them glitching out Plus occam's razor yk Im guessing they can use google or something but it's odd because they clearly don't have access to the time submitted by /u/Shadowfax42- [link] [comments]  ( 1 min )
    Arcane Style Transfer
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    The most basic AI scheme.
    submitted by /u/idvlknv [link] [comments]
    Nvidia AI Designs GPU 3,600x Faster | Breakthrough MRKL NLP Techniques | AI Mind To Predict Dementia
    submitted by /u/getrich_or_diemining [link] [comments]
    Is there a AI that is able to turn normal pictures in that kind of pictures or that I am able to plott them with my pen-plotter?
    submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    What are cool AI tools which I could use to edit images/videos?
    (I have no experience with photoshop) submitted by /u/xXNOdrugsForMEXx [link] [comments]
    Are You The Asshole? New AI Mimics Infamous Advice Subreddit
    submitted by /u/Emmanuel_T_Goldstein [link] [comments]
    Bias in Artificial Intelligence; Is Diversity the Key to the Future Of AI?
    submitted by /u/JencyJane [link] [comments]
    Ai Becomes Sentient 3
    submitted by /u/webauteur [link] [comments]
    To be AI-first, do AI last
    submitted by /u/bendee983 [link] [comments]
    GPT-3 not available in my country
    is there a way to access any of openAI APIs if it's not avalable in my country? submitted by /u/dogaryy [link] [comments]  ( 1 min )
    Text Summarization with Huggingface Transformers and Python
    submitted by /u/RubiksCodeNMZ [link] [comments]
    Reface app deepfake technology
    submitted by /u/Aggravating-Deal-260 [link] [comments]
    Demo of Google's new PHORUM Image→3D Figure Project
    submitted by /u/NichodonARG [link] [comments]
  • Open

    AI Hitman Learns to Find Waldo
    submitted by /u/TernaryJimbo [link] [comments]
    Help with walk forward validation in LSTM problem
    Hi, I am having some trouble with a LSTM problem regarding walk forward validation in my LSTM. The problem is described in this stackoverflow post: https://stackoverflow.com/questions/71990833/using-predictions-instead-of-observed-values-in-walk-forward-validation-in-lstm If any one could help me that would be much appreciated submitted by /u/magnussendjoko [link] [comments]  ( 1 min )
    I don't think this was posted here before but it's incredible. The latest image generation from text
    submitted by /u/gwtkof [link] [comments]
    NN from Scratch: #5 Updating parameters | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Text Summarization with Huggingface Transformers and Python
    submitted by /u/RubiksCodeNMZ [link] [comments]
  • Open

    Host Hugging Face transformer models using Amazon SageMaker Serverless Inference
    The last few years have seen rapid growth in the field of natural language processing (NLP) using transformer deep learning architectures. With its Transformers open-source library and machine learning (ML) platform, Hugging Face makes transfer learning and the latest transformer models accessible to the global AI community. This can reduce the time needed for data […]  ( 8 min )
    How Nordic Aviation Capital uses Amazon Rekognition to streamline operations and save up to EUR200,000 annually
    Nordic Aviation Capital (NAC) is the industry’s leading regional aircraft lessor, serving almost 70 airlines in approximately 45 countries worldwide. In 2021, NAC turned to AWS to help it use artificial intelligence (AI) to further improve its leasing operations and reduce its reliance on manual labor. With Amazon Rekognition Custom Labels, NAC built an AI […]  ( 5 min )
  • Open

    Estimating the informativeness of data
    MIT researchers can now estimate how much information data are likely to contain, in a more accurate and scalable way than previous methods.  ( 6 min )
    An easier way to teach robots new skills
    Researchers have developed a technique that enables a robot to learn a new pick-and-place task with only a handful of human demonstrations.  ( 7 min )
  • Open

    SimpleGrid env for OpenAI gym
    SimpleGrid is a simple gridworld environment for OpenAI gym. It is easy to use and customise and it is intended to offer an environment for quick testing and prototyping different RL algorithms. I developed this environment by taking inspiration from the FrozenLake environment and gym-minigrid. Check it out at: https://github.com/damat-le/gym-simplegrid ​ https://i.redd.it/prqd7muujqv81.gif submitted by /u/damat-le [link] [comments]  ( 1 min )
    Hybrid CPU topology impact on training
    Hi, some newer CPU's (Apple M1, Intel Alder Lake) have started using a hybrid CPU topology. I.e. the CPU consists of some P-cores (high performance cores) and E-cores (slower 'eco' cores). I personally do not own such a CPU yet but I'm considering upgrading to one soon, so I am looking for experiences of people with such a CPU on training reinforcement models. Does everything work as expected? Are there annoyances? I'm using Ray Tune + RLlib in which I assign a single core per trial. In this case I'm expecting that trials running on E-cores will simply run (much) slower than those running on P-cores. In case a single trial gets assigned both a P-core and an E-core, I do expect a serious slow-down. These are all guesses really, so I'm looking for people with actual experience here. Thanks. submitted by /u/katsu9 [link] [comments]  ( 1 min )
    Policy Iteration on OpenAI Gym taxi-v3
    Hey everyone, I managed to implement the policy iteration from Sutton & Barto, 2018 on the FrozenLake-v1 and wanted to do the same now Taxi-v3 environment. My code has been running now for 45min so I guess there is something wrong, but I can't wrap my head around what it could be. Would appreciate some input on what I need to change so that it will work. Please see my code here: ```[python] import gym # openAi gym import torch import matplotlib.pyplot as plt from tqdm import trange # progressbar torch.manual_seed(4) env = gym.make('Taxi-v3') def policy_evaluation(env: gym.Env, policy: torch.Tensor, gamma: float, threshold: float): V = torch.zeros(env.observation_space.n) delta = float("inf") while delta >= threshold: V_tmp = torch.empty(env.observation_space.n) for state in range(…  ( 1 min )
    ICLR 2022 Blog Post: The 37 Implementation Details of Proximal Policy Optimization
    submitted by /u/vwxyzjn [link] [comments]  ( 1 min )
    Deep Reinforcement Learning Free Class by Hugging Face 🤗
    Hey there! We're happy to announce the launch of the Hugging Face Deep Reinforcement Learning class! 🤗 👉 Register here https://forms.gle/oXAeRgLW4qZvUZeu9 In this free course, you will: 📖 Study Deep Reinforcement Learning in theory and practice. 🧑‍💻 Learn to use famous Deep RL libraries such as Stable Baselines3, RL Baselines3 Zoo, and RLlib. 🤖 Train agents in unique environments with SnowballFight, Huggy the Doggo 🐶, and classical ones such as Space Invaders and PyBullet. 💾 Publish your trained agents in one line of code to the Hub. But also download powerful agents from the community. 🏆 Participate in challenges where you will evaluate your agents against other teams. 🖌️🎨 Learn to share your environments made with Unity and Godot. 👉 Register here https://forms.gle/oXAeRgLW4qZvUZeu9 📚 The syllabus: https://github.com/huggingface/deep-rl-class https://preview.redd.it/b409a9sscov81.jpg?width=1920&format=pjpg&auto=webp&s=ebdfa10c220b5a3dec17894bc0f955ed9d8f7634 If you have questions and feedback, I would love to answer them, Thanks, submitted by /u/cranthir_ [link] [comments]  ( 1 min )
    Has anyone solved deepmind control locomotion in state-based?
    Hi, I'm wondering if I could train the state-based locomotion well from the scratch. In my case, I trained SAC agent and It showed divergence of the q function. I think the cause is the unscaled vector input. Before the trouble shooting, I want to ask someone trained agent in this environment. Thank you for reading. https://github.com/deepmind/dm_control/tree/main/dm_control/locomotion submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Can't solve OpenAI problems
    I started off with OpenAI's mountain car, and I don't know where to even start. I got the setup working with the environment, but now have no idea how to train it. How do I learn to code for RL? Every tutorial I have seen so far has implemented Q Learning algorithms completely with little explanation. I looked at the solved code and it doesn't make sense to me. How should I prepare before I go into OpenAI? submitted by /u/TrepidationTD [link] [comments]  ( 1 min )
  • Open

    Google at ICLR 2022
    Posted by Cat Armato and Callan Hajosy, Program Managers The 10th International Conference on Learning Representations (ICLR 2022) kicks off this week, bringing together researchers, entrepreneurs, engineers and students alike to discuss and explore the rapidly advancing field of deep learning. Entirely virtual this year, ICLR 2022 offers conference and workshop tracks that present some of the latest research in deep learning and its applications to areas ranging from computer vision, speech recognition and text understanding to robotics, computational biology, and more. As a Platinum Sponsor of ICLR 2022 and Champion DEI Action Fund contributor, Google will have a robust presence with nearly 100 accepted publications and extensive participation on organizing committees and in workshops…  ( 12 min )
  • Open

    Projective duality
    The previous post explained how to define a projective plane over a field F. Now let’s look at how we do geometry in a projective plane. Definitions We have a definition of points from the other post: a point is a triple (a, b, c) of elements of F, with not all elements equal to […] Projective duality first appeared on John D. Cook.  ( 3 min )
    Finite projective planes
    Given a field F, finite or infinite, you can construct a projective plane over F by starting with pairs of elements of F and adding “points at infinity,” one point for each direction. Motivation: Bézout’s theorem A few days ago I mentioned Bézout’s theorem as an example of a simple theorem that rests on complex […] Finite projective planes first appeared on John D. Cook.  ( 4 min )
  • Open

    PPE: A fast and provably efficient RL algorithm for exogenous noise
    Picture a person walking in a park by a pond. The surrounding environment contains a number of moving objects that change the quality of the environment: clouds moving to hide the sun, altering the quality of light; ducks gliding across the pond, causing its surface to ripple; people walking along a path, their images reflecting […] The post PPE: A fast and provably efficient RL algorithm for exogenous noise appeared first on Microsoft Research.  ( 9 min )
  • Open

    Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop
    When brainstorming a scene to best showcase the groundbreaking capabilities of the Omniverse platform, some NVIDIA artists turned to a cherished memory: enjoying ramen together in a mom-and-pop shop down a side street in Tokyo. Simmering pots of noodles, steaming dumplings, buzzing kitchen appliances, warm ambient lighting and glistening black ledger stools. These were all Read article > The post Let Me Shoyu How It’s Done: Creating the NVIDIA Omniverse Ramen Shop appeared first on NVIDIA Blog.  ( 4 min )
    Stellar Weather: Researchers Describe the Skies of Exoplanets
    A paper released today describes in the greatest detail to date the atmospheres on distant planets. Seeking the origins of what’s in and beyond the Milky Way, researchers surveyed 25 exoplanets, bodies that orbit stars far beyond our solar system. Specifically, they studied hot Jupiters, the largest and thus easiest to detect exoplanets, many sweltering Read article > The post Stellar Weather: Researchers Describe the Skies of Exoplanets appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Let’s Talk about Machine Translation: The powering engine behind “Google Translate”
    At some point, we’ve all used Google translate, Microsoft,DeepL or Bing translator to impress our friends/colleagues who speak a different…  ( 4 min )
  • Open

    Should I Use Offline RL or Imitation Learning?
    Figure 1: Summary of our recommendations for when a practitioner should BC and various imitation learning style methods, and when they should use offline RL approaches. Offline reinforcement learning allows learning policies from previously collected data, which has profound implications for applying RL in domains where running trial-and-error learning is impractical or dangerous, such as safety-critical settings like autonomous driving or medical treatment planning. In such scenarios, online exploration is simply too risky, but offline RL methods can learn effective policies from logged data collected by humans or heuristically designed controllers. Prior learning-based control methods have also approached learning from existing data as imitation learning: if the data is generally “goo…  ( 11 min )

  • Open

    [D] Legality of Hosting ImageNet
    Despite it's immense popularity in academia, it's surprisingly difficult to download the ImageNet Object Localization dataset. As far as I can tell this is due to legal issues -- no single entity owns the copyright to the images, so no entity can host the whole dataset. The result is that if you want to use ImageNet you're forced to either manually scrape a million URLs (requiring both cpu time, your time, and imposing costs on a million unsuspecting websites), know somebody who has already done that, or fetch it from a legally questionable source. So I have a couple questions: Is the owner of the ImageNet dataset on Kaggle performing a selfless public service, thanklessly accepting legal liability to make ImageNet more accessible? Or is she protected (e.g. by fair use)? Is she required to accept DMCA requests? If I'd like to share ImageNet (I've recently downloaded it and processed it to be ~10 GB, which seems like a helpful thing to share), is there any legally safe path for me to do this? submitted by /u/you-get-an-upvote [link] [comments]  ( 3 min )
    [D] Machine Learning - WAYR (What Are You Reading) - Week 136
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 12 Week 22 Week 32 Week 42 Week 52 Week 62 Week 72 Week 82 Week 92 Week 102 Week 112 Week 122 Week 132 Week 3 Week 13 Week 23 Week 33 Week 43 Week 53 Week 63 Week 73 Week 83 Week 93 Week 103 Week 113 Week 123 Week 133 Week 4 Week 14 Week 24 Week 34 Week 44 Week 54 Week 64 Week 74 Week 84 Week 94 Week 104 Week 114 Week 124 Week 134 Week 5 Week 15 Week 25 Week 35 Week 45 Week 55 Week 65 Week 75 Week 85 Week 95 Week 105 Week 115 Week 125 Week 135 Week 6 Week 16 Week 26 Week 36 Week 46 Week 56 Week 66 Week 76 Week 86 Week 96 Week 106 Week 116 Week 126 Week 7 Week 17 Week 27 Week 37 Week 47 Week 57 Week 67 Week 77 Week 87 Week 97 Week 107 Week 117 Week 127 Week 8 Week 18 Week 28 Week 38 Week 48 Week 58 Week 68 Week 78 Week 88 Week 98 Week 108 Week 118 Week 128 Week 9 Week 19 Week 29 Week 39 Week 49 Week 59 Week 69 Week 79 Week 89 Week 99 Week 109 Week 119 Week 129 Week 10 Week 20 Week 30 Week 40 Week 50 Week 60 Week 70 Week 80 Week 90 Week 100 Week 110 Week 120 Week 130 Most upvoted papers two weeks ago: /u/CatalyzeX_code_bot: Paper link /u/lauren_v2: paper Besides that, there are no rules, have fun. submitted by /u/ML_WAYR_bot [link] [comments]  ( 1 min )
    [P] Showcase your Machine Learning Research/Projects in Hugging Face Spaces using Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [D] How is NVIDIA P100 on Google Colab Pro compared to Laptop with RTX3080 (Mobile, or Max-Q) ?
    submitted by /u/aviisu [link] [comments]  ( 6 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 2 min )
    [D] What is the best optimizer to use when visualization inter-net neurons by optimizing random input in relation to it?
    Hi, seemingly it's become a staple in conv net inner-working visualization to put a network in eval mode, sample a random noise image, and optimize the image in relation to the activation of some internal neuron. From what I saw, most examples of this on the internet are using an Adam optimizer for this with a learning rate of 0.1 and a weight decay of 1e-6. This doesn't seem quite right with me, So if any of you know what's the source for this convention and if there are other alternatives I'd appreciate this information very much. Thanks! submitted by /u/ondrea_luciduma [link] [comments]  ( 1 min )
  • Open

    Projects to jump into RL?
    I have some experience with ML but not at all with RL. I know basic theory and want to just get started with the programming part. I saw OpenAI's gym, but I want to learn RL that can be applied anywhere. I don't want to be specifically constrained to OpenAI's gym and want something where I can apply it to game development, such as Unity. Are there any good resources to just do an RL project? submitted by /u/TrepidationTD [link] [comments]  ( 1 min )
    N Step Prioritized Replay Buffer
    I have a few questions about implementing the N Step version of Prioritized Replay Buffer (for Rainbow DQN). I'm implementing the Atari version of this buffer. To conserve memory, I'm only storing the states (and the last state of each episode) in an unstacked manner. That is, if the frame stack is 4, and the shape of states returned by the environment is 4, I'm storing only the last state of the stacked states. This way the buffer contains all transitions from each episode. As for the N Steps part, I'm only calculating the n_step states when getting states from the buffer instead of storing the N Step transitions directly. For the prioritized version, how do the priorities work? If I wasn't trying to conserve memory, I would have stored the N Step stacked states directly and update the priorities for the segment trees and the segment tree pointers only when moving the data from the N Step buffer to the main buffer. But now that I'm calculating the N Step experience directly when sampling and not when adding data to the buffer, when do I update the priorities and the tree pointer? Once per state? Once per stacked state? Once per N Step stacked state? When I update the priorities for the sampled batch, the priorities are associated with multiple states (because the states are stacked). But because I don't store the stacked states and only the raw unstacked states, to which of these states should I update the priority for? And because I don't store the n step transitions anymore, to which of the N Steps should the priorities be associated with? If I want to create such a buffer for a vector env, how would I go about doing it? I'm thinking of maintaining a separate segment tree for each env. Is that correct? Is there a better way? submitted by /u/SirRantcelot [link] [comments]  ( 1 min )
    Confused between "centralized critic" and "centralized training decentralized execution"
    I don't understand if having a centralized critic in multi-agent RL is the same as having a centralized training decentralized execution approach. Can you help me clarify this? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    There's a "Yo Mama" joke in their somewhere!!
    submitted by /u/boss_007 [link] [comments]  ( 1 min )
    _Algorithms for Decision Making_, Kochenderfer et al 2022 (textbook draft; more classical ML than S&B)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Solving a large (dynamic) maze using DQN (reward & observation)
    Consider a fixed maze with size (100x400) and fixed start and endpoints (1,1) and (99,399), where doors dynamically appear around the agent every time an observation has been made. The doors disapear after some time step n (for simplicity say n=1). For pitty sake lets say the optimum path length is 1000 steps and there is only one way to reach it. I have two questions: (1) What would be the most appropriate way to frame reward function? I have tested both volatile aproach (i.e. episodeis terminated if agent steps into wall or previosuly made path, otherwise is rewarded in "cheese" every 5 steps) and not-so-volatile approach (i.e. free roam for maximum 2000 steps with cheese reward every 5 steps). Evidently these do not work, are there other perhaps more promising approaches or such large mazes? (2) Does the colour of objects in a frame matter to the (convolutional) DQN's learning of the maze. Considering a 3 channel RGB input, is there perhaps a smarter way to colour code the walls/emptypath/etc.? Its a bit of a weird question but my curiosity arises from short-term memory examples (namely, space invaders) where the input, multiple greyscale frames, is chosen over one rgb frame - so to this degree channels can be understood as higher-level "features" of a frame, hence, can the colours of different objects be optimised? If so is there a logical way of viewing such optimisation? Any other advice is always welcome :) ​ N.B. yes, this could probably be solved with other easier methods but lets just say DQN (or other deep-Q alternatives) is needed. submitted by /u/Background-Cable-491 [link] [comments]  ( 2 min )
    RL for classification problems
    Hello, I aim to use RL in for classification problem, But I can't see where is the difference between using RL and other ML algorithms that are used in classification (such as MLP, KNN, SVM ..) since we have a train phase in which we teach an agent ( or ML algorithm) the classe of each sample of a labeled dataset. Ok the manner of teching is different but the concept is the same. Then, in a second phase, we test the model with a test set. My question is, if I choose RL for classification problem, what is the contribution that we can have compared to another algorithm? submitted by /u/fatenLouati [link] [comments]  ( 2 min )
    Design of next observation when collision for 2d continuous maze
    Hi, I am trying to create a continuous 2d maze environment. I tried several algorithms but no one can give me a stable 1 success rate. From time to time, it gets stuck around the obstacle corner and fails to move anymore. Like the picture shows below. I guess it relates to my bad design for giving the next observation for an action that can't make the agent move forward. Currently, my design evenly separates the action into 10 substeps, and returns the observation which corresponds to the step right before the agent meets an obstacle. It looks like I won't have this issue for mujoco environment like Ant. But it is really hard to see their design. https://preview.redd.it/swohj8h8qev81.png?width=282&format=png&auto=webp&s=7353a62e99ff2141b3660c68ae34ce4f5cd9b94f submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    Why can't we make a perfect AI for Starcraft through evolution
    First of all, let's discuss what the level of AI is now. If the "level" refers to the capability of competing, the current AI has been very closed to the top human player in some types of games, like chess, Texas Poker, and Mahjong of CARDS, DOTA2 of MOBA, as well as StarCraft2 of RTS. As for other games, if we have enough human resources and computing performance, we also can get similar results. If the "level" has other meanings, like AI agents having human behavior, intelligent NPC can be designed specifically for different people so that they can have different gaming experience. These are all at the stage of issue-defining and exploring new technology solutions. Although traditional game AI is mostly based on hard code, it still has much prior knowledge. In recent years, some hot ML-r…  ( 4 min )
  • Open

    India and US have decided to advance cooperation in emerging technologies in the fields of communication, artificial intelligence
    submitted by /u/dannylenwinn [link] [comments]  ( 1 min )
    Research on AI!
    Dear all, Currently I am writing my thesis on the effects of Artificial Intelligence (AI) on employee performance through employee engagement. If you work in a company that uses AI and you: - have a direct relationship (e.g. data scientist) or indirect relationship (e.g. business manager) with AI - often or sometimes use AI in your daily work then your insights are essential for my master thesis research and I would really appreciate it if you would fill out the survey below (5-10 min / English & Dutch translation). https://uva.fra1.qualtrics.com/jfe/form/SV_4TTnzJE4IQUoHWK Please feel free to spread the survey to as many relevant people you may know. Much appreciated!! Britt submitted by /u/BrittHermans [link] [comments]  ( 1 min )
    AI Dream 44 - Epic Cathedral Supernatural Visit
    submitted by /u/LordPewPew777 [link] [comments]
    In the deep
    submitted by /u/Hacknaut [link] [comments]
    Nota AI Introduces New Machine Learning Tools Under Its NetsPresso Platform For Automatically Searching Optimized Models And Making Compression Process Easy And Fast
    ​ https://preview.redd.it/beo6czc6hhv81.png?width=1706&format=png&auto=webp&s=385f65ebfed9344781d1f9238d8d508dae27c0a3 In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is. We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation. Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation. Continue reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    [D] MindSpore AI Scientific Computing Series (15): Protein Function Prediction
    submitted by /u/Creative_Habit_6868 [link] [comments]
    Training an AI Hitman To Find Waldo
    submitted by /u/TernaryJimbo [link] [comments]
    Kanye wes t thro the wire
    ​ https://preview.redd.it/qaqevbqlfev81.png?width=1024&format=png&auto=webp&s=8c2759d29a3713c2ffb998e20e8995cee4d8b2b4 submitted by /u/Smek_dev [link] [comments]
  • Open

    Nota AI Introduces New Machine Learning Tools Under Its NetsPresso Platform For Automatically Searching Optimized Models And Making Compression Process Easy And Fast
    In the last decade, AI research has brought astonishing results in many fields, and, undoubtedly, AI is nowadays a central technology in many aspects of our life. As new ideas are proposed every day, this continuous research usually comes with infinite applications: from the algorithms assisting surgeons in complex operations to the one which allows unlocking our phone using just our face. In this evolution from the idea to the actual implementation, it is often ignored how hard the passage between theoretical research and working application is. We can refer to this process as AI Development Cycle for Edge AI and can be divided into three phases related to 1) data, 2) model, and 3) evaluation. Many aspects must be considered: first, each different AI application requires a specific dataset. For this reason, in this step, the aim is to prepare the data, which, as is well known, is one of the crucial topics of AI: a good algorithm always relies on a good dataset. This phase can be divided into data collection, curation, labeling, and preparation. Continue reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )

  • Open

    Time Series Analysis for air pollution data not aligned [R] [P]
    This is about a project that I am working on; Hope the ML community can help me! I have collected few hours of air pollutants data using Aeroqual sensors and custom made sensors. 3 types of data is available in the project; aeroqual, custom, council data. Where council data can be taken for granted (It comes from the govt installed high spec sensor). Aeroqual is a commercial sensor manufacturing company, its data should be accurate. The first part of the project is about checking the accuracy of custom sensor. So, I have done few analysis on the data; and found that custom sensor data has similarity (but not same, there are so much variation in the custom sensor data) with council sensor data but aeroqual data is way different. I am attaching the plot below which I have done. ​ So I need to know is there any method that I can find relationship between these three datasets? Is it possible to make these data align togather? I need to build an ML model to predict the air pollutant level using this data. any tips for getting this thing working? - Thanks in advance ​ https://preview.redd.it/fzu7dbsz0dv81.png?width=885&format=png&auto=webp&s=516d65fe3290ac8a28159547880f9dc972922b64 submitted by /u/Codename_17 [link] [comments]  ( 1 min )
    [D] For training a HAAR cascade is it better to manually remove noise from positive training images or to leave it in so the data is more realistic?
    submitted by /u/Counter-Business [link] [comments]  ( 1 min )
    [D] How to convert papers to code?
    My problem is probably what you have guessed: it's understanding the technical specifications which are usually written in a non-coding-friendly way. Sometimes crucial information is completely missing from the paper ex: loss function description for a DL algorithm. For the lucky cases where there are already available implementations on github to a given paper, usually they are either very distinguishable from each other in terms of code structure which questions their validity or whether they match what the paper authors intended specially with varying measurable results, or they are almost exact copies from one another. There are numerous examples where I can show specific papers with varying degrees of complexity, and discuss why the conversion can be tricky but they may require standalone discussions themselves, likely outside the scope of this one. Is there a way to approach the problem assuming the absence of reference code? submitted by /u/shine-box [link] [comments]  ( 2 min )
    Open Source Model For Identifying Extremism Online [Project]
    submitted by /u/OppositeMonday [link] [comments]
    [P] Tired of manually sending minutes of meeting
    I host an important org level meeting (~100 attendees) every week, and need to share minutes after the meeting. I am so tired of listening to conversations again just to capture important points, summarise discussion and action items. Is there any model/api which can help me do that? I use Amazon transcribe to generate transcripts, which helps, but it is not very accurate. For me the priority would be: 1) Model/api which is better than Amazon transcribe 2) Auto Identify speakers / speaker diarization (since mostly the same set of people speak) 3) Summarise the conversations into topics (we have time and agenda based discussion) I am sure this might be a problem across the industry since most of the meetings happen online, and someone wastes hours after meeting to send notes. I did find some tools which summarise the transcript, but i need to auto send in a specific format and identify topics based on conversion (maybe we can input the agenda in advance). Also this is private information, so I need something on premise, hence looking for a repo or model which i can use to build something on top. Please let me know if something exists or someone working on similar projects. Happy to collaborate and contribute. submitted by /u/super_commando-dhruv [link] [comments]  ( 1 min )
    [R] ?? Can you find out which news article is written by AI ??
    This research will test the human ability to distinguish human written text from text generated by artificial intelligence. Participating will only take 10 minutes. You will receive 2 short news articles about the same topic. One will be written by a human, the other one will be generated by artificial intelligence. It is up to you to find out which one is written by artificial intelligence. You will be asked to do this for four different subjects, namely: Science, Economics & Politics, Society and Sports. At the end of the survey you will receive feedback on how well you have performed. The human written articles were collected from various news websites. The Articles created by artificial intelligence were generated using GPT-3 from OpenAI. Purpose of the research: We are trying to find out how well GPT-3 performs across subjects. Are there any subject GPT-3 is better at writing about, or is he equally good across all subjects. Secondly we are testing the ability of GPT-3 to generate articles about events that happened after the training of the model. You can participate by clicking on the link below, thank you very much for your participation. https://vub.fra1.qualtrics.com/jfe/form/SV_b2E9f6hGxNDH13M submitted by /u/RobinSandersVUB [link] [comments]  ( 1 min )
    [R] I need to run >2000 experiments for my PhD work. How much would 2000 GPUs for 1 day cost?
    2000 GPUs and 8000 CPUs. And where could I even get such a vast affordance? submitted by /u/samlerman [link] [comments]  ( 2 min )
    [P] Vectorflow is a minimalist neural network library optimized for sparse data and single machine environments open sourced by Netflix
    submitted by /u/ur_mum_goes_to_uni [link] [comments]
    [Project] Face detection algorithms comparison
    I selected 5 ready-made algorithms for face detection and compared them with each other by such metrics as Precision, Recall, IOU and time on the dataset I marked up. I am ready to accept your Pull Request with your solutions(algorithms) and results! GitHub: https://github.com/wb-08/face-detection-algorithms-comparison Blog post: https://habr.com/ru/post/661671/ submitted by /u/wb-08 [link] [comments]
    [Discussion] Writing production grade code for ML in python
    I have been interviewing for a machine learning lead position. I have successfully passed 3 interview rounds (coding , HR, system design). I have my final interview with the VP of Engineering. When asked how best to prepare myself, they said they would like to test my ability to write "production quality" code in python. While I do have some experience, the downside is I worked in small R&D teams for a long time. Though I am knowledgeable in python, perhaps, I might have not followed all the industry best practices. If you are a hiring manager or interviewer, how would you test this ability? How do I prepare myself to prove my ability to write production grade code? Thank you all so much in advance. submitted by /u/mbkv [link] [comments]  ( 4 min )
    [D] Comparing the efficiency of different GAN models
    I'm comparing different GAN models (CGan, DCGan, WGan, StyleGan) in tensorflow2. In general, I want to use the images that I generate with the generator to train a classifier while being as realistic as possible. At first, I wanted to let them train for 24 hours each, define some early stopping criteria and save the checkpoints with the lowest loss through a callback. But it seems that the lower loss does not always lead to more realistic images. So how do I compare the different models in a scientific way? Because the results highly depend on the epoch I choose and my subjective feeling, which images look the best. submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P], Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    https://www.youtube.com/watch?v=2E_6ARbrMmc submitted by /u/Thenamessd [link] [comments]
    [N] Google's new AI image analysis is pretty LiT - and beats OpenAI's CLIP
    submitted by /u/much_successes [link] [comments]
    [P] A Simpler @PyTorch Annotated Implementation of EleutherAI's 20B Language Model GPT-NeoX.
    Github: https://github.com/labmlai/neox Annotated implementation: https://lit.labml.ai/github/labmlai/neox/tree/main/src/neox/__init__.py Original repo from EleutherAI: https://github.com/EleutherAI/gpt-neox We have included samples showing how to generate text and to fine-tune. We haven't included a bunch of optimizations that were present in original GPT-NeoX to keep things simple. submitted by /u/hnipun [link] [comments]  ( 1 min )
    [P] treequeues: transfert jax pytrees between processes with very high speed!
    Hello! If you are using jax and you need to pass some pytrees between processes, I may have something for you :) I developed a "treequeue". It is a queue that is made for pytree's nested arrays. The transfer speed is up to 10 times higher than regular queues. This is done by utilizing shared memory arrays and avoiding pickling data. This can be very useful when developing distributed architecture, e.g. distributed reinforcement learning where speed is at the upmost importance. In my case this implementation was very useful to remove bottlenecks when implementing PBT algorithms! https://github.com/thomashirtz/treequeues Cheers! submitted by /u/krenast [link] [comments]  ( 1 min )
    [D] ‘auton-survival’ package for deep survival analysis and time to event regression from CMU.
    Comes with ‘white paper’ and example notebooks… seems legit..? Anyone tried this out yet? Github Paper] submitted by /u/proportional-hazard [link] [comments]
    [P] Unofficial ViT-VQGAN implementation
    I know that many people (including me) were surprised after seeing the image quality of ViT-VQGAN and disappointed to know there won't be no source code released. Therefore, I've decided to implement it by myself and here is the code. I hope this can help everyone as a starting point for ViT-VQGAN. submitted by /u/ThunaClone [link] [comments]
    [R][P] StyleGAN-Human: A Data-Centric Odyssey of Human Generation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Review of end-to-end multi-modal deep learning approach for autonomous navigation
    In reviewing various approaches to end-to-end deep learning for autonomous driving, I've come across an interesting approach in this paper that I would like to discuss with others... I will begin by summarizing the approach: ​ A ResNet50 architecture is used as an encoder network with the input being an RGB image + depth map concatenated as (224 x 224 x 4). In the paper it is argued that a point cloud can also be used, or some other sensor modality would also work The encoder network output (feature map of 7 x 7 x 2048) is fed into a decoder network that takes it back to (224 x 224 x 5) with pixel wise semantic segmentation of 5 classes: lane, road line, sidewalk, vehicles or pedestrians, and others That same encoder output (feature map of 7 x 7 x 2048) is global average pooled to 2…  ( 2 min )
  • Open

    mGPT: Few-Shot Learners Go Multilingual
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Biological feedback will save us all
    Dall-E-2. Excellent. It's very high quality. But it's a combination of the data. ​ What did we want? We wanted some amazing work that made us cry with just one line of writing or one image. ​ "Oh, copied it well. It's pretty much the same." It's not enough. But how can that be improve? I think the answer is the feedback method. ​ ​ The current evaluation method of writing, image, video, and sound is too indirect. ​ Sales revenue Number of Subscribers Number of views Like / Dislike Ratings by section, Revisit Rate <<< Those are better than others Emotion analysis of Comments using AI Internal staff scores ​ There are so many conditions other than the quality of contents that people's judgment can intervene in. In the first place, people don't express exactly what they…  ( 3 min )
    16 images generated for text prompt "Woah there, Dragonman!" using a text-to-image AI model from CompVis that uses latent diffusion (crosspost of another user's post)
    submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    NVIDIA Instant NeRF: Turn Photos into 3D Scenes in Milliseconds ! Video demo
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    MIT's new machine-learning system M2I may someday help driverless cars predict the next moves of others
    submitted by /u/qptbook [link] [comments]
    Artificial Nightmares: Split Personality || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Human Like AI where should i start
    Hello there, if one would want to get into AI and especially human like AI, would you still recommend getting into machine learning first? As far as i know machine learning doesnt even try to develop "human like" AI/"bottom up AI", but rather focuses on training algorythms to solve specific problems. I know human like AI is something thats highly complex and we still need years if not even decades to achieve something even close to it but i would appreciate tips and ideas nonetheless. (after reading through my question again this sounds like a generic question thats being asked here everyday, if thats the case please send me a link to a similar post if there is one :) ) submitted by /u/Garic152 [link] [comments]  ( 1 min )
    help with a project idea
    Hi everyone Im doing a project with my friends where we should use computer vision/iot to create a solution for people with disabilities or in the healthcare system Any ideas please submitted by /u/armyy__ [link] [comments]
    Meta AI Researchers Built An End-To-End Machine Learning Platform Called Looper, With Easy-To-Use APIs For Decision-Making And Feedback Collection
    From improving the user experience to making the computational infrastructure more effective, AI is a crucial aspect of making current software systems and products perform as well as possible. AI is often more effective than even precisely developed human-crafted heuristic tactics today, whether it’s reducing latency, boosting the quality of a video stream, or streamlining the interfaces to match a specific person’s demands. But, to use AI more effectively in various products, several challenges must be addressed: the system must accommodate software engineers without machine learning backgrounds; it must provide mechanisms to optimize for a variety of product goals, which may differ from closed-form machine learning loss functions; it must distinguish causal connections from data correlations; and it must scale efficiently to train, host, and monitor vast numbers of AI models. Meta Researchers Develop ‘Looper,’ an end-to-end AI platform that has been designed with easy-to-use APIs for optimization, personalization, and feedback collecting to answer these needs. Looper may be used to support the entire machine learning lifecycle, from model training to deployment and inference to product evaluation and optimization. Looper allows us to modify the existing products to leverage AI for personalized optimizations rather than having to rebuild them around AI models. Currently, the Looper platform hosts 700 AI models and produces 4 million AI outputs every second. Continue reading Paper: https://arxiv.org/pdf/2110.07554.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there any programs that can output a sentence based on input sentences?
    I'm looking to create a way to automate original story ideas based on previous ideas. I want to be able to input 1000+ original sentences and have an output of an original sentence that is inspired the previous ones. Are there any programs that can do this or will I need to develop my own? submitted by /u/yea_okay_dude [link] [comments]  ( 1 min )
    Ultimate Guide to Activation Functions
    submitted by /u/SirFletch [link] [comments]
  • Open

    How to stop stable baseline model during the training exactly at the end of frame?
    I am training PPO2 model on stable-baseline library. I have tabular data with 15000 rows, thus length of the episodes is 15000. I am using nminibatches=4, n_envs=1. For example, I have set total_timesteps=10000. During the training process agent will see 15000 rows several times and updates actions for each rows, but in some particular point, the rest of the time total_timesteps will not be enough to see the full episode, and only part of episodes is available in the last step of learning. To be concrete. For simplicity, lets say we have 10 raws, 23 total_timesteps. The agent will see the full episode 2 times, and only the first 3 rows in the third times and rest of the 7 raws have not seen during last step. I want to stop the learning process when Agent reaches the last time full episodes (above example stop learning at when total_timesteps=20) or define total_timesteps in such a way to see full episodes at the end of the training step. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    New to RL
    Hello guys, I am pretty new to the rl field and write now i am doing my thesis in it. I've come across a problem in my code. I created a custom environment and when i am trying to solve it with my dqn agent using stable baselines3, I am able to execute the code and print out the required things but the agent is not learning. Any help ? thanks. submitted by /u/last_2_brain_cells97 [link] [comments]  ( 1 min )
    Questions on policy gradients
    Hi guys, I am new to RL and reading tutorial of spinning up which focus on policy based algorithms. In the derivation of VPG, the tutorial said"The environment has no dependence on /theta(the parameter of policy), so gradients of R(/tau)(total return of the trajectory) with respect of /theta is 0. However, the trajectory depends on our policy, and our policy depends on /theta. As a result, I am confused why total return of trajectory is independent from /theta. submitted by /u/SkyRimT [link] [comments]  ( 2 min )
    Vicarious exits: acquihired by Google robotics (Intrinsic) & DeepMind
    submitted by /u/gwern [link] [comments]
  • Open

    GOOGLE researchers create animated avatars from a single photo
    submitted by /u/SpatialComputing [link] [comments]  ( 1 min )
    I don't understand why I am getting NaN loss scores. Can anyone explain what I am doing wrong ?
    submitted by /u/brike3 [link] [comments]  ( 1 min )
    Are there applications of neural networks other than machine learning?
    I see lots of hardware oriented toward AI/ML stuff these days, including chips with hardware acceleration for neural networks. I'm thinking about how GPUs were initially designed for graphics calculations, but then things like CUDA and OpenCL were developed to make that hardware usable for broader applications of parallel processing. Are there any other things that you can do with a neural network besides backpropagation, that wouldn't be easier to do in other ways? submitted by /u/Bananawamajama [link] [comments]  ( 1 min )
  • Open

    My Paper Reviewing Load
    In academia, for better or worse, we have what’s called a peer review system, where papers get accepted to journals, conferences, or other venues on the basis of reviews from other researchers, who ideally are subject area experts and thus are qualified to evaluate the paper. The reviewers also cannot have a conflict of interest with the authors, and should not be overwhelmed with too many papers to review. This is the ideal world, and is not always what happens in practice. From my experience in the robotics academic community (and this may apply to other disciplines), it generally seems like there is no standard definition of an “appropriate” or “maximum” reviewing load for a reviewer. This is difficult to define as different papers mandate different reviewing efforts; a massive journal …  ( 4 min )
  • Open

    A manifold learning approach for gesture recognition from micro-Doppler radar measurements. (arXiv:2110.01670v4 [cs.LG] UPDATED)
    A recent paper (Neural Networks, {\bf 132} (2020), 253-268) introduces a straightforward and simple kernel based approximation for manifold learning that does not require the knowledge of anything about the manifold, except for its dimension. In this paper, we examine how the pointwise error in approximation using least squares optimization based on similarly localized kernels depends upon the data characteristics and deteriorates as one goes away from the training data. The theory is presented with an abstract localized kernel, which can utilize any prior knowledge about the data being located on an unknown sub-manifold of a known manifold. We demonstrate the performance of our approach using a publicly available micro-Doppler data set, and investigate the use of different preprocessing measures, kernels, and manifold dimensions. Specifically, it is shown that the localized kernel introduced in the above mentioned paper when used with PCA components leads to a near-competitive performance to deep neural networks, and offers significant improvements in training speed and memory requirements. To demonstrate the fact that our methods are agnostic to the domain knowledge, we examine the classification problem in a simple video data set.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    Accurate detection of sepsis at ED triage using machine learning with clinical natural language processing. (arXiv:2204.07657v2 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Accurate detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to determine whether EHR data can be extracted and synthesized with the latest machine learning algorithms (KATE Sepsis) and clinical natural language processing to produce accurate sepsis models, and compare KATE Sepsis performance with existing sepsis screening protocols, such as SIRS and qSOFA. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16 participating hospitals. KATE Sepsis, SIRS, standard screening (SIRS with source of infection) and qSOFA were tested in three settings. Cohort-A was a retrospective analysis on medical records from a single Site 1. Cohort-B was a prospective analysis of Site 1. Cohort-C was a retrospective analysis on Site 1 with 15 additional sites. Across all cohorts, KATE Sepsis demonstrates an AUC of 0.94-0.963 with 73-74.87% TPR and 3.76-7.17% FPR. Standard screening demonstrates an AUC of 0.682-0.726 with 39.39-51.19% TPR and 2.9-6.02% FPR. The qSOFA protocol demonstrates an AUC of 0.544-0.56, with 10.52-13.18% TPR and 1.22-1.68% FPR. For severe sepsis, across all cohorts, KATE Sepsis demonstrates an AUC of 0.935-0.972 with 70-82.26% TPR and 4.64-8.62% FPR. For septic shock, across all cohorts, KATE Sepsis demonstrates an AUC of 0.96-0.981 with 85.71-89.66% TPR and 4.85-8.8% FPR. SIRS, standard screening, and qSOFA demonstrate low AUC and TPR for severe sepsis and septic shock detection. KATE Sepsis provided substantially better sepsis detection performance in triage than commonly used screening protocols.
    Visual Attention Methods in Deep Learning: An In-Depth Survey. (arXiv:2204.07756v2 [cs.CV] UPDATED)
    Inspired by the human cognitive system, attention is a mechanism that imitates the human cognitive awareness about specific information, amplifying critical details to focus more on the essential aspects of data. Deep learning has employed attention to boost performance for many applications. Interestingly, the same attention design can suit processing different data modalities and can easily be incorporated into large networks. Furthermore, multiple complementary attention mechanisms can be incorporated in one network. Hence, attention techniques have become extremely attractive. However, the literature lacks a comprehensive survey specific to attention techniques to guide researchers in employing attention in their deep models. Note that, besides being demanding in terms of training data and computational resources, transformers only cover a single category in self-attention out of the many categories available. We fill this gap and provide an in-depth survey of 50 attention techniques categorizing them by their most prominent features. We initiate our discussion by introducing the fundamental concepts behind the success of attention mechanism. Next, we furnish some essentials such as the strengths and limitations of each attention category, describe their fundamental building blocks, basic formulations with primary usage, and applications specifically for computer vision. We also discuss the challenges and open questions related to attention mechanism in general. Finally, we recommend possible future research directions for deep attention.
    On Distribution Shift in Learning-based Bug Detectors. (arXiv:2204.10049v1 [cs.LG])
    Deep learning has recently achieved initial success in program analysis tasks such as bug detection. Lacking real bugs, most existing works construct training and test data by injecting synthetic bugs into correct programs. Despite achieving high test accuracy (e.g. >90%), the resulting bug detectors are found to be surprisingly unusable in practice, i.e., <10% precision when used to scan real software repositories. In this work, we argue that this massive performance difference is caused by distribution shift, i.e., a fundamental mismatch between the real bug distribution and the synthetic bug distribution used to train and evaluate the detectors. To address this key challenge, we propose to train a bug detector in two phases, first on a synthetic bug distribution to adapt the model to the bug detection domain, and then on a real bug distribution to drive the model towards the real distribution. During these two phases, we leverage a multi-task hierarchy, focal loss, and contrastive learning to further boost performance. We evaluate our approach extensively on three widely studied bug types, for which we construct new datasets carefully designed to capture the real bug distribution. The results demonstrate that our approach is practically effective and successfully mitigates the distribution shift: our learned detectors are highly performant on both our constructed test set and the latest version of open source repositories.
    Persua: A Visual Interactive System to Enhance the Persuasiveness of Arguments in Online Discussion. (arXiv:2204.07741v2 [cs.HC] UPDATED)
    Persuading people to change their opinions is a common practice in online discussion forums on topics ranging from political campaigns to relationship consultation. Enhancing people's ability to write persuasive arguments could not only practice their critical thinking and reasoning but also contribute to the effectiveness and civility in online communication. It is, however, not an easy task in online discussion settings where written words are the primary communication channel. In this paper, we derived four design goals for a tool that helps users improve the persuasiveness of arguments in online discussions through a survey with 123 online forum users and interviews with five debating experts. To satisfy these design goals, we analyzed and built a labeled dataset of fine-grained persuasive strategies (i.e., logos, pathos, ethos, and evidence) in 164 arguments with high ratings on persuasiveness from ChangeMyView, a popular online discussion forum. We then designed an interactive visual system, Persua, which provides example-based guidance on persuasive strategies to enhance the persuasiveness of arguments. In particular, the system constructs portfolios of arguments based on different persuasive strategies applied to a given discussion topic. It then presents concrete examples based on the difference between the portfolios of user input and high-quality arguments in the dataset. A between-subjects study shows suggestive evidence that Persua encourages users to submit more times for feedback and helps users improve more on the persuasiveness of their arguments than a baseline system. Finally, a set of design considerations was summarized to guide future intelligent systems that improve the persuasiveness in text.
    Learning to Hash Naturally Sorts. (arXiv:2201.13322v2 [cs.CV] UPDATED)
    Learning to hash pictures a list-wise sorting problem. Its testing metrics, e.g., mean-average precision, count on a sorted candidate list ordered by pair-wise code similarity. However, scarcely does one train a deep hashing model with the sorted results end-to-end because of the non-differentiable nature of the sorting operation. This inconsistency in the objectives of training and test may lead to sub-optimal performance since the training loss often fails to reflect the actual retrieval metric. In this paper, we tackle this problem by introducing Naturally-Sorted Hashing (NSH). We sort the Hamming distances of samples' hash codes and accordingly gather their latent representations for self-supervised training. Thanks to the recent advances in differentiable sorting approximations, the hash head receives gradients from the sorter so that the hash encoder can be optimized along with the training procedure. Additionally, we describe a novel Sorted Noise-Contrastive Estimation (SortedNCE) loss that selectively picks positive and negative samples for contrastive learning, which allows NSH to mine data semantic relations during training in an unsupervised manner. Our extensive experiments show the proposed NSH model significantly outperforms the existing unsupervised hashing methods on three benchmarked datasets.
    Random Dilated Shapelet Transform: A New Approach for Time Series Shapelets. (arXiv:2109.13514v2 [cs.CV] UPDATED)
    Shapelet-based algorithms are widely used for time series classification because of their ease of interpretation, but they are currently outperformed by recent state-of-the-art approaches. We present a new formulation of time series shapelets including the notion of dilation, and we introduce a new shapelet feature to enhance their discriminative power for classification. Experiments performed on 112 datasets show that our method improves on the state-of-the-art shapelet algorithm, and achieves comparable accuracy to recent state-of-the-art approaches, without sacrificing neither scalability, nor interpretability.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Deep learning techniques for energy clustering in the CMS ECAL. (arXiv:2204.10277v1 [hep-ex])
    The reconstruction of electrons and photons in CMS depends on topological clustering of the energy deposited by an incident particle in different crystals of the electromagnetic calorimeter (ECAL). These clusters are formed by aggregating neighbouring crystals according to the expected topology of an electromagnetic shower in the ECAL. The presence of upstream material (beampipe, tracker and support structures) causes electrons and photons to start showering before reaching the calorimeter. This effect, combined with the 3.8T CMS magnetic field, leads to energy being spread in several clusters around the primary one. It is essential to recover the energy contained in these satellite clusters in order to achieve the best possible energy resolution for physics analyses. Historically satellite clusters have been associated to the primary cluster using a purely topological algorithm which does not attempt to remove spurious energy deposits from additional pileup interactions (PU). The performance of this algorithm is expected to degrade during LHC Run 3 (2022+) because of the larger average PU levels and the increasing levels of noise due to the ageing of the ECAL detector. New methods are being investigated that exploit state-of-the-art deep learning architectures like Graph Neural Networks (GNN) and self-attention algorithms. These more sophisticated models improve the energy collection and are more resilient to PU and noise, helping to preserve the electron and photon energy resolution achieved during LHC Runs 1 and 2. This work will cover the challenges of training the models as well the opportunity that this new approach offers to unify the ECAL energy measurement with the particle identification steps used in the global CMS photon and electron reconstruction.
    Condition Monitoring of Transformer Bushings Using Computational Intelligence. (arXiv:2204.10193v1 [cs.LG])
    Dissolved Gas-in-oil analysis (DGA) is used to monitor the condition of bushings on large power transformers. There are different techniques used in determining the conditions from the data collected, but in this work the Artificial Intelligence techniques are investigated. This work investigates which gases in DGA are related to each other and which ones are important for making decisions. When the related and crucial gases are determined, the other gases are discarded thereby reducing the number of attributes in DGA. Hence a further investigation is done to see how these new datasets influence the performance of the classifiers used to classify the DGA of full attributes. The classifiers used in these experiments were Backpropagation Neural Networks (BPNN) and Support Vector Machines (SVM) whereas the Principal Component Analysis (PCA), Rough Set (RS), Incremental Granular Ranking (GR++) and Decision Trees (DT) were used to reduce the attributes of the dataset. The parameters used when training the BPNN and SVM classifiers are kept fixed to create a controlled test environment when investigating the effects of reducing the number of gases. This work further introduced a new classifier that can handle high dimension dataset and noisy dataset, Rough Neural Network (RNN).
    Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions. (arXiv:2203.12235v2 [cs.CL] UPDATED)
    The syntactic categories of categorial grammar formalisms are structured units made of smaller, indivisible primitives, bound together by the underlying grammar's category formation rules. In the trending approach of constructive supertagging, neural models are increasingly made aware of the internal category structure, which in turn enables them to more reliably predict rare and out-of-vocabulary categories, with significant implications for grammars previously deemed too complex to find practical use. In this work, we revisit constructive supertagging from a graph-theoretic perspective, and propose a framework based on heterogeneous dynamic graph convolutions aimed at exploiting the distinctive structure of a supertagger's output space. We test our approach on a number of categorial grammar datasets spanning different languages and grammar formalisms, achieving substantial improvements over previous state of the art scores. Code will be made available at https://github.com/konstantinosKokos/dynamic-graph-supertagging
    Hybrid Cloud-Edge Collaborative Data Anomaly Detection in Industrial Sensor Networks. (arXiv:2204.09942v1 [cs.CR])
    Industrial control systems (ICSs) are facing increasing cyber-physical attacks that can cause catastrophes in the physical system. Efficient anomaly detection models in the industrial sensor networks are essential for enhancing ICS reliability and security, due to the sensor data is related to the operational state of the ICS. Considering the limited availability of computing resources, this paper proposes a hybrid anomaly detection approach in cloud-edge collaboration industrial sensor networks. The hybrid approach consists of sensor data detection models deployed at the edges and a sensor data analysis model deployed in the cloud. The sensor data detection model based on Gaussian and Bayesian algorithms can detect the anomalous sensor data in real-time and upload them to the cloud for further analysis, filtering the normal sensor data and reducing traffic load. The sensor data analysis model based on Graph convolutional network, Residual algorithm and Long short-term memory network (GCRL) can effectively extract the spatial and temporal features and then identify the attack precisely. The proposed hybrid anomaly detection approach is evaluated using a benchmark dataset and baseline anomaly detection models. The experimental results show that the proposed approach can achieve an overall 11.19% increase in Recall and an impressive 14.29% improvement in F1-score, compared with the existing models.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.
    A Revealing Large-Scale Evaluation of Unsupervised Anomaly Detection Algorithms. (arXiv:2204.09825v1 [cs.LG])
    Anomaly detection has many applications ranging from bank-fraud detection and cyber-threat detection to equipment maintenance and health monitoring. However, choosing a suitable algorithm for a given application remains a challenging design decision, often informed by the literature on anomaly detection algorithms. We extensively reviewed twelve of the most popular unsupervised anomaly detection methods. We observed that, so far, they have been compared using inconsistent protocols - the choice of the class of interest or the positive class, the split of training and test data, and the choice of hyperparameters - leading to ambiguous evaluations. This observation led us to define a coherent evaluation protocol which we then used to produce an updated and more precise picture of the relative performance of the twelve methods on five widely used tabular datasets. While our evaluation cannot pinpoint a method that outperforms all the others on all datasets, it identifies those that stand out and revise misconceived knowledge about their relative performances.
    Multi-label classification for biomedical literature: an overview of the BioCreative VII LitCovid Track for COVID-19 literature topic annotations. (arXiv:2204.09781v1 [cs.DL])
    The COVID-19 pandemic has been severely impacting global society since December 2019. Massive research has been undertaken to understand the characteristics of the virus and design vaccines and drugs. The related findings have been reported in biomedical literature at a rate of about 10,000 articles on COVID-19 per month. Such rapid growth significantly challenges manual curation and interpretation. For instance, LitCovid is a literature database of COVID-19-related articles in PubMed, which has accumulated more than 200,000 articles with millions of accesses each month by users worldwide. One primary curation task is to assign up to eight topics (e.g., Diagnosis and Treatment) to the articles in LitCovid. Despite the continuing advances in biomedical text mining methods, few have been dedicated to topic annotations in COVID-19 literature. To close the gap, we organized the BioCreative LitCovid track to call for a community effort to tackle automated topic annotation for COVID-19 literature. The BioCreative LitCovid dataset, consisting of over 30,000 articles with manually reviewed topics, was created for training and testing. It is one of the largest multilabel classification datasets in biomedical scientific literature. 19 teams worldwide participated and made 80 submissions in total. Most teams used hybrid systems based on transformers. The highest performing submissions achieved 0.8875, 0.9181, and 0.9394 for macro F1-score, micro F1-score, and instance-based F1-score, respectively. The level of participation and results demonstrate a successful track and help close the gap between dataset curation and method development. The dataset is publicly available via https://ftp.ncbi.nlm.nih.gov/pub/lu/LitCovid/biocreative/ for benchmarking and further development.
    Accurate Molecular-Orbital-Based Machine Learning Energies via Unsupervised Clustering of Chemical Space. (arXiv:2204.09831v1 [physics.chem-ph])
    We introduce an unsupervised clustering algorithm to improve training efficiency and accuracy in predicting energies using molecular-orbital-based machine learning (MOB-ML). This work determines clusters via the Gaussian mixture model (GMM) in an entirely automatic manner and simplifies an earlier supervised clustering approach [J. Chem. Theory Comput., 15, 6668 (2019)] by eliminating both the necessity for user-specified parameters and the training of an additional classifier. Unsupervised clustering results from GMM have the advantage of accurately reproducing chemically intuitive groupings of frontier molecular orbitals and having improved performance with an increasing number of training examples. The resulting clusters from supervised or unsupervised clustering is further combined with scalable Gaussian process regression (GPR) or linear regression (LR) to learn molecular energies accurately by generating a local regression model in each cluster. Among all four combinations of regressors and clustering methods, GMM combined with scalable exact Gaussian process regression (GMM/GPR) is the most efficient training protocol for MOB-ML. The numerical tests of molecular energy learning on thermalized datasets of drug-like molecules demonstrate the improved accuracy, transferability, and learning efficiency of GMM/GPR over not only other training protocols for MOB-ML, i.e., supervised regression-clustering combined with GPR(RC/GPR) and GPR without clustering. GMM/GPR also provide the best molecular energy predictions compared with the ones from literature on the same benchmark datasets. With a lower scaling, GMM/GPR has a 10.4-fold speedup in wall-clock training time compared with scalable exact GPR with a training size of 6500 QM7b-T molecules.
    Memory Bounds for the Experts Problem. (arXiv:2204.09837v1 [cs.DS])
    Online learning with expert advice is a fundamental problem of sequential prediction. In this problem, the algorithm has access to a set of $n$ "experts" who make predictions on each day. The goal on each day is to process these predictions, and make a prediction with the minimum cost. After making a prediction, the algorithm sees the actual outcome on that day, updates its state, and then moves on to the next day. An algorithm is judged by how well it does compared to the best expert in the set. The classical algorithm for this problem is the multiplicative weights algorithm. However, every application, to our knowledge, relies on storing weights for every expert, and uses $\Omega(n)$ memory. There is little work on understanding the memory required to solve the online learning with expert advice problem, or run standard sequential prediction algorithms, in natural streaming models, which is especially important when the number of experts, as well as the number of days on which the experts make predictions, is large. We initiate the study of the learning with expert advice problem in the streaming setting, and show lower and upper bounds. Our lower bound for i.i.d., random order, and adversarial order streams uses a reduction to a custom-built problem using a novel masking technique, to show a smooth trade-off for regret versus memory. Our upper bounds show novel ways to run standard sequential prediction algorithms in rounds on small "pools" of experts, thus reducing the necessary memory. For random-order streams, we show that our upper bound is tight up to low order terms. We hope that these results and techniques will have broad applications in online learning, and can inspire algorithms based on standard sequential prediction techniques, like multiplicative weights, for a wide range of other problems in the memory-constrained setting.
    FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis. (arXiv:2204.09934v1 [eess.AS])
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hindered their applications to speech synthesis. This paper proposes FastDiff, a fast conditional diffusion model for high-quality speech synthesis. FastDiff employs a stack of time-aware location-variable convolutions of diverse receptive field patterns to efficiently model long-term time dependencies with adaptive conditions. A noise schedule predictor is also adopted to reduce the sampling steps without sacrificing the generation quality. Based on FastDiff, we design an end-to-end text-to-speech synthesizer, FastDiff-TTS, which generates high-fidelity speech waveforms without any intermediate feature (e.g., Mel-spectrogram). Our evaluation of FastDiff demonstrates the state-of-the-art results with higher-quality (MOS 4.28) speech samples. Also, FastDiff enables a sampling speed of 58x faster than real-time on a V100 GPU, making diffusion models practically applicable to speech synthesis deployment for the first time. We further show that FastDiff generalized well to the mel-spectrogram inversion of unseen speakers, and FastDiff-TTS outperformed other competing methods in end-to-end text-to-speech synthesis. Audio samples are available at \url{https://FastDiff.github.io/}.
    Eliminating Backdoor Triggers for Deep Neural Networks Using Attention Relation Graph Distillation. (arXiv:2204.09975v1 [cs.LG])
    Due to the prosperity of Artificial Intelligence (AI) techniques, more and more backdoors are designed by adversaries to attack Deep Neural Networks (DNNs).Although the state-of-the-art method Neural Attention Distillation (NAD) can effectively erase backdoor triggers from DNNs, it still suffers from non-negligible Attack Success Rate (ASR) together with lowered classification ACCuracy (ACC), since NAD focuses on backdoor defense using attention features (i.e., attention maps) of the same order. In this paper, we introduce a novel backdoor defense framework named Attention Relation Graph Distillation (ARGD), which fully explores the correlation among attention features with different orders using our proposed Attention Relation Graphs (ARGs). Based on the alignment of ARGs between both teacher and student models during knowledge distillation, ARGD can eradicate more backdoor triggers than NAD. Comprehensive experimental results show that, against six latest backdoor attacks, ARGD outperforms NAD by up to 94.85% reduction in ASR, while ACC can be improved by up to 3.23%.
    Social Media Sentiment Analysis for Cryptocurrency Market Prediction. (arXiv:2204.10185v1 [cs.CL])
    In this paper, we explore the usability of different natural language processing models for the sentiment analysis of social media applied to financial market prediction, using the cryptocurrency domain as a reference. We study how the different sentiment metrics are correlated with the price movements of Bitcoin. For this purpose, we explore different methods to calculate the sentiment metrics from a text finding most of them not very accurate for this prediction task. We find that one of the models outperforms more than 20 other public ones and makes it possible to fine-tune it efficiently given its interpretable nature. Thus we confirm that interpretable artificial intelligence and natural language processing methods might be more valuable practically than non-explainable and non-interpretable ones. In the end, we analyse potential causal connections between the different sentiment metrics and the price movements.
    FedCL: Federated Contrastive Learning for Privacy-Preserving Recommendation. (arXiv:2204.09850v1 [cs.LG])
    Contrastive learning is widely used for recommendation model learning, where selecting representative and informative negative samples is critical. Existing methods usually focus on centralized data, where abundant and high-quality negative samples are easy to obtain. However, centralized user data storage and exploitation may lead to privacy risks and concerns, while decentralized user data on a single client can be too sparse and biased for accurate contrastive learning. In this paper, we propose a federated contrastive learning method named FedCL for privacy-preserving recommendation, which can exploit high-quality negative samples for effective model training with privacy well protected. We first infer user embeddings from local user data through the local model on each client, and then perturb them with local differential privacy (LDP) before sending them to a central server for hard negative sampling. Since individual user embedding contains heavy noise due to LDP, we propose to cluster user embeddings on the server to mitigate the influence of noise, and the cluster centroids are used to retrieve hard negative samples from the item pool. These hard negative samples are delivered to user clients and mixed with the observed negative samples from local data as well as in-batch negatives constructed from positive samples for federated model training. Extensive experiments on four benchmark datasets show FedCL can empower various recommendation methods in a privacy-preserving way.
    Adversarial Contrastive Learning by Permuting Cluster Assignments. (arXiv:2204.10314v1 [cs.LG])
    Contrastive learning has gained popularity as an effective self-supervised representation learning technique. Several research directions improve traditional contrastive approaches, e.g., prototypical contrastive methods better capture the semantic similarity among instances and reduce the computational burden by considering cluster prototypes or cluster assignments, while adversarial instance-wise contrastive methods improve robustness against a variety of attacks. To the best of our knowledge, no prior work jointly considers robustness, cluster-wise semantic similarity and computational efficiency. In this work, we propose SwARo, an adversarial contrastive framework that incorporates cluster assignment permutations to generate representative adversarial samples. We evaluate SwARo on multiple benchmark datasets and against various white-box and black-box attacks, obtaining consistent improvements over state-of-the-art baselines.
    TND-NAS: Towards Non-Differentiable Objectives in Differentiable Neural Architecture Search. (arXiv:2111.03892v2 [cs.LG] UPDATED)
    Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its high efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving the search performance and reducing the GPU-memory consumption. However, these methods are no longer naturally capable of tackling the non-differentiable objectives, e.g., energy, resource-constrained efficiency, and other metrics, let alone the multi-objective search demands. Researches in the multi-objective NAS field target this but requires vast computational resources cause of the sole optimization of each candidate architecture. In light of this discrepancy, we propose the TND-NAS, which is with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS. Under the differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($\alpha$) been optimized in discrete space, while resorting to the progressive search space shrinking by $\alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3\%, 2.4M/2.95\%, 9.57M/2.54\%) and CIFAR100 (2.46M/18.3\%, 5.46/16.73\%, 12.88/15.20\%) datasets. Favorably, compared with other multi-objective NAS methods, TND-NAS is less time-consuming (1.3 GPU-days on NVIDIA 1080Ti, 1/6 of that in NSGA-Net), and can be conveniently adapted to real-world NAS scenarios (resource-constrained, platform-specialized).
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.  ( 2 min )
    Surfer100: Generating Surveys From Web Resources on Wikipedia-style. (arXiv:2112.06377v2 [cs.CL] UPDATED)
    Fast-developing fields such as Artificial Intelligence (AI) often outpace the efforts of encyclopedic sources such as Wikipedia, which either do not completely cover recently-introduced topics or lack such content entirely. As a result, methods for automatically producing content are valuable tools to address this information overload. We show that recent advances in pretrained language modeling can be combined for a two-stage extractive and abstractive approach for Wikipedia lead paragraph generation. We extend this approach to generate longer Wikipedia-style summaries with sections and examine how such methods struggle in this application through detailed studies with 100 reference human-collected surveys. This is the first study on utilizing web resources for long Wikipedia-style summaries to the best of our knowledge.  ( 2 min )
    Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity. (arXiv:2111.05329v3 [cs.CV] UPDATED)
    We present CrissCross, a self-supervised framework for learning audio-visual representations. A novel notion is introduced in our framework whereby in addition to learning the intra-modal and standard synchronous cross-modal relations, CrissCross also learns asynchronous cross-modal relationships. We show that by relaxing the temporal synchronicity between the audio and visual modalities, the network learns strong generalized representations. Our experiments show that strong augmentations for both audio and visual modalities with relaxation of cross-modal temporal synchronicity optimize performance. To pretrain our proposed framework, we use 3 different datasets with varying sizes, Kinetics-Sound, Kinetics400, and AudioSet. The learned representations are evaluated on a number of downstream tasks namely action recognition, sound classification, and retrieval. CrissCross shows state-of-the-art performances on action recognition (UCF101 and HMDB51) and sound classification (ESC50 and DCASE). The codes and pretrained models will be made publicly available.  ( 2 min )
    NICO++: Towards Better Benchmarking for Domain Generalization. (arXiv:2204.08040v2 [cs.CV] UPDATED)
    Despite the remarkable performance that modern deep neural networks have achieved on independent and identically distributed (I.I.D.) data, they can crash under distribution shifts. Most current evaluation methods for domain generalization (DG) adopt the leave-one-out strategy as a compromise on the limited number of domains. We propose a large-scale benchmark with extensive labeled domains named NICO++ along with more rational evaluation methods for comprehensively evaluating DG algorithms. To evaluate DG datasets, we propose two metrics to quantify covariate shift and concept shift, respectively. Two novel generalization bounds from the perspective of data construction are proposed to prove that limited concept shift and significant covariate shift favor the evaluation capability for generalization. Through extensive experiments, NICO++ shows its superior evaluation capability compared with current DG datasets and its contribution in alleviating unfairness caused by the leak of oracle knowledge in model selection.
    Relevance-guided Unsupervised Discovery of Abilities with Quality-Diversity Algorithms. (arXiv:2204.09828v1 [cs.NE])
    Quality-Diversity algorithms provide efficient mechanisms to generate large collections of diverse and high-performing solutions, which have shown to be instrumental for solving downstream tasks. However, most of those algorithms rely on a behavioural descriptor to characterise the diversity that is hand-coded, hence requiring prior knowledge about the considered tasks. In this work, we introduce Relevance-guided Unsupervised Discovery of Abilities; a Quality-Diversity algorithm that autonomously finds a behavioural characterisation tailored to the task at hand. In particular, our method introduces a custom diversity metric that leads to higher densities of solutions near the areas of interest in the learnt behavioural descriptor space. We evaluate our approach on a simulated robotic environment, where the robot has to autonomously discover its abilities based on its full sensory data. We evaluated the algorithms on three tasks: navigation to random targets, moving forward with a high velocity, and performing half-rolls. The experimental results show that our method manages to discover collections of solutions that are not only diverse, but also well-adapted to the considered downstream task.
    DeepGate: Learning Neural Representations of Logic Gates. (arXiv:2111.14616v3 [cs.LG] UPDATED)
    Applying deep learning (DL) techniques in the electronic design automation (EDA) field has become a trending topic. Most solutions apply well-developed DL models to solve specific EDA problems. While demonstrating promising results, they require careful model tuning for every problem. The fundamental question on "How to obtain a general and effective neural representation of circuits?" has not been answered yet. In this work, we take the first step towards solving this problem. We propose DeepGate, a novel representation learning solution that effectively embeds both logic function and structural information of a circuit as vectors on each gate. Specifically, we propose transforming circuits into unified and-inverter graph format for learning and using signal probabilities as the supervision task in DeepGate. We then introduce a novel graph neural network that uses strong inductive biases in practical circuits as learning priors for signal probability prediction. Our experimental results show the efficacy and generalization capability of DeepGate.
    BTranspose: Bottleneck Transformers for Human Pose Estimation with Self-Supervised Pre-Training. (arXiv:2204.10209v1 [cs.LG])
    The task of 2D human pose estimation is challenging as the number of keypoints is typically large (~ 17) and this necessitates the use of robust neural network architectures and training pipelines that can capture the relevant features from the input image. These features are then aggregated to make accurate heatmap predictions from which the final keypoints of human body parts can be inferred. Many papers in literature use CNN-based architectures for the backbone, and/or combine it with a transformer, after which the features are aggregated to make the final keypoint predictions [1]. In this paper, we consider the recently proposed Bottleneck Transformers [2], which combine CNN and multi-head self attention (MHSA) layers effectively, and we integrate it with a Transformer encoder and apply it to the task of 2D human pose estimation. We consider different backbone architectures and pre-train them using the DINO self-supervised learning method [3], this pre-training is found to improve the overall prediction accuracy. We call our model BTranspose, and experiments show that on the COCO validation set, our model achieves an AP of 76.4, which is competitive with other methods such as [1] and has fewer network parameters. Furthermore, we also present the dependencies of the final predicted keypoints on both the MHSA block and the Transformer encoder layers, providing clues on the image sub-regions the network attends to at the mid and high levels.
    Understanding the Domain Gap in LiDAR Object Detection Networks. (arXiv:2204.10024v1 [cs.CV])
    In order to make autonomous driving a reality, artificial neural networks have to work reliably in the open-world. However, the open-world is vast and continuously changing, so it is not technically feasible to collect and annotate training datasets which accurately represent this domain. Therefore, there are always domain gaps between training datasets and the open-world which must be understood. In this work, we investigate the domain gaps between high-resolution and low-resolution LiDAR sensors in object detection networks. Using a unique dataset, which enables us to study sensor resolution domain gaps independent of other effects, we show two distinct domain gaps - an inference domain gap and a training domain gap. The inference domain gap is characterised by a strong dependence on the number of LiDAR points per object, while the training gap shows no such dependence. These fndings show that different approaches are required to close these inference and training domain gaps.
    Towards Reliable Neural Generative Modeling of Detectors. (arXiv:2204.09947v1 [physics.ins-det])
    The increasing luminosities of future data taking at Large Hadron Collider and next generation collider experiments require an unprecedented amount of simulated events to be produced. Such large scale productions demand a significant amount of valuable computing resources. This brings a demand to use new approaches to event generation and simulation of detector responses. In this paper, we discuss the application of generative adversarial networks (GANs) to the simulation of the LHCb experiment events. We emphasize main pitfalls in the application of GANs and study the systematic effects in detail. The presented results are based on the Geant4 simulation of the LHCb Cherenkov detector.
    Conditional entropy minimization principle for learning domain invariant representation features. (arXiv:2201.10460v3 [cs.LG] UPDATED)
    Invariance principle-based methods, for example, Invariant Risk Minimization (IRM), have recently emerged as promising approaches for Domain Generalization (DG). Despite the promising theory, invariance principle-based approaches fail in common classification tasks due to the mixture of the true invariant features and the spurious invariant features. In this paper, we propose a framework based on the conditional entropy minimization principle to filter out the spurious invariant features leading to a new algorithm with a better generalization capability. We theoretically prove that under some particular assumptions, the representation function can precisely recover the true invariant features. In addition, we also show that the proposed approach is closely related to the well-known Information Bottleneck (IB) framework. Both the theoretical and numerical results are provided to justify our approach.
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.
    Learning Forward Dynamics Model and Informed Trajectory Sampler for Safe Quadruped Navigation. (arXiv:2204.08647v3 [cs.RO] UPDATED)
    For autonomous quadruped robot navigation in various complex environments, a typical SOTA system is composed of four main modules -- mapper, global planner, local planner, and command-tracking controller -- in a hierarchical manner. In this paper, we build a robust and safe local planner which is designed to generate a velocity plan to track a coarsely planned path from the global planner. Previous works used waypoint-based methods (e.g. Proportional-Differential control and pure pursuit) which simplify the path tracking problem to local point-goal navigation. However, they suffer from frequent collisions in geometrically complex and narrow environments because of two reasons; the global planner uses a coarse and inaccurate model and the local planner is unable to track the global plan sufficiently well. Currently, deep learning methods are an appealing alternative because they can learn safety and path feasibility from experience more accurately. However, existing deep learning methods are not capable of planning for a long horizon. In this work, we propose a learning-based fully autonomous navigation framework composed of three innovative elements: a learned forward dynamics model (FDM), an online sampling-based model-predictive controller, and an informed trajectory sampler (ITS). Using our framework, a quadruped robot can autonomously navigate in various complex environments without a collision and generate a smoother command plan compared to the baseline method. Furthermore, our method can reactively handle unexpected obstacles on the planned path and avoid them. Project page https://awesomericky.github.io/projects/FDM_ITS_navigation/.
    Revisiting Consistency Regularization for Semi-supervised Change Detection in Remote Sensing Images. (arXiv:2204.08454v3 [cs.CV] UPDATED)
    Remote-sensing (RS) Change Detection (CD) aims to detect "changes of interest" from co-registered bi-temporal images. The performance of existing deep supervised CD methods is attributed to the large amounts of annotated data used to train the networks. However, annotating large amounts of remote sensing images is labor-intensive and expensive, particularly with bi-temporal images, as it requires pixel-wise comparisons by a human expert. On the other hand, we often have access to unlimited unlabeled multi-temporal RS imagery thanks to ever-increasing earth observation programs. In this paper, we propose a simple yet effective way to leverage the information from unlabeled bi-temporal images to improve the performance of CD approaches. More specifically, we propose a semi-supervised CD model in which we formulate an unsupervised CD loss in addition to the supervised Cross-Entropy (CE) loss by constraining the output change probability map of a given unlabeled bi-temporal image pair to be consistent under the small random perturbations applied on the deep feature difference map that is obtained by subtracting their latent feature representations. Experiments conducted on two publicly available CD datasets show that the proposed semi-supervised CD method can reach closer to the performance of supervised CD even with access to as little as 10% of the annotated training data. Code available at https://github.com/wgcban/SemiCD
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.
    Debiased Learning from Naturally Imbalanced Pseudo-Labels. (arXiv:2201.01490v2 [cs.LG] UPDATED)
    Pseudo-labels are confident predictions made on unlabeled target data by a classifier trained on labeled source data. They are widely used for adapting a model to unlabeled data, e.g., in a semi-supervised learning setting. Our key insight is that pseudo-labels are naturally imbalanced due to intrinsic data similarity, even when a model is trained on balanced source data and evaluated on balanced target data. If we address this previously unknown imbalanced classification problem arising from pseudo-labels instead of ground-truth training labels, we could remove model biases towards false majorities created by pseudo-labels. We propose a novel and effective debiased learning method with pseudo-labels, based on counterfactual reasoning and adaptive margins: The former removes the classifier response bias, whereas the latter adjusts the margin of each class according to the imbalance of pseudo-labels. Validated by extensive experimentation, our simple debiased learning delivers significant accuracy gains over the state-of-the-art on ImageNet-1K: 26% for semi-supervised learning with 0.2% annotations and 9% for zero-shot learning. Our code is available at: https://github.com/frank-xwang/debiased-pseudo-labeling.
    Cross-Speaker Emotion Transfer for Low-Resource Text-to-Speech Using Non-Parallel Voice Conversion with Pitch-Shift Data Augmentation. (arXiv:2204.10020v1 [eess.AS])
    Data augmentation via voice conversion (VC) has been successfully applied to low-resource expressive text-to-speech (TTS) when only neutral data for the target speaker are available. Although the quality of VC is crucial for this approach, it is challenging to learn a stable VC model because the amount of data is limited in low-resource scenarios, and highly expressive speech has large acoustic variety. To address this issue, we propose a novel data augmentation method that combines pitch-shifting and VC techniques. Because pitch-shift data augmentation enables the coverage of a variety of pitch dynamics, it greatly stabilizes training for both VC and TTS models, even when only 1,000 utterances of the target speaker's neutral data are available. Subjective test results showed that a FastSpeech 2-based emotional TTS system with the proposed method improved naturalness and emotional similarity compared with conventional methods.
    One-Step Abductive Multi-Target Learning with Diverse Noisy Samples: An Application to Tumour Segmentation for Breast Cancer. (arXiv:2110.10325v5 [cs.LG] UPDATED)
    Recent studies have demonstrated the effectiveness of the combination of machine learning and logical reasoning in inventing advanced artificial intelligence technologies. One-step abductive multi-target learning (OSAMTL), an approach that only combines machine learning and logical reasoning in a one-step balanced way, has as well shown its effectiveness in handling complex noisy labels of a single noisy sample in medical histopathology whole slide image analysis (MHWSIA). However, OSAMTL is not suitable for the situation where diverse noisy samples (DiNS) are provided for a learning task. In this paper, giving definition of DiNS, we propose one-step abductive multi-target learning with DiNS (OSAMTL-DiNS) to expand the original OSAMTL to handle complex noisy labels of DiNS. Applying OSAMTL-DiNS to tumour segmentation for breast cancer in MHWSIA, we show that OSAMTL-DiNS is able to enable various state-of-the-art approaches for learning from noisy labels to achieve more rational predictions.  ( 2 min )
    Dynamical simulation via quantum machine learning with provable generalization. (arXiv:2204.10269v1 [quant-ph])
    Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we develop a framework for using QML methods to simulate quantum dynamics on near-term quantum hardware. We use generalization bounds, which bound the error a machine learning model makes on unseen data, to rigorously analyze the training data requirements of an algorithm within this framework. This provides a guarantee that our algorithm is resource-efficient, both in terms of qubit and data requirements. Our numerics exhibit efficient scaling with problem size, and we simulate 20 times longer than Trotterization on IBMQ-Bogota.  ( 2 min )
    Handling Imbalanced Classification Problems With Support Vector Machines via Evolutionary Bilevel Optimization. (arXiv:2204.10231v1 [cs.LG])
    Support vector machines (SVMs) are popular learning algorithms to deal with binary classification problems. They traditionally assume equal misclassification costs for each class; however, real-world problems may have an uneven class distribution. This article introduces EBCS-SVM: evolutionary bilevel cost-sensitive SVMs. EBCS-SVM handles imbalanced classification problems by simultaneously learning the support vectors and optimizing the SVM hyperparameters, which comprise the kernel parameter and misclassification costs. The resulting optimization problem is a bilevel problem, where the lower level determines the support vectors and the upper level the hyperparameters. This optimization problem is solved using an evolutionary algorithm (EA) at the upper level and sequential minimal optimization (SMO) at the lower level. These two methods work in a nested fashion, that is, the optimal support vectors help guide the search of the hyperparameters, and the lower level is initialized based on previous successful solutions. The proposed method is assessed using 70 datasets of imbalanced classification and compared with several state-of-the-art methods. The experimental results, supported by a Bayesian test, provided evidence of the effectiveness of EBCS-SVM when working with highly imbalanced datasets.  ( 2 min )
    DooDLeNet: Double DeepLab Enhanced Feature Fusion for Thermal-color Semantic Segmentation. (arXiv:2204.10266v1 [cs.LG])
    In this paper we present a new approach for feature fusion between RGB and LWIR Thermal images for the task of semantic segmentation for driving perception. We propose DooDLeNet, a double DeepLab architecture with specialized encoder-decoders for thermal and color modalities and a shared decoder for final segmentation. We combine two strategies for feature fusion: confidence weighting and correlation weighting. We report state-of-the-art mean IoU results on the MF dataset.  ( 2 min )
    Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications. (arXiv:2202.08916v3 [eess.IV] UPDATED)
    Image-based characterization and disease understanding involve integrative analysis of morphological, spatial, and topological information across biological scales. The development of graph convolutional networks (GCNs) has created the opportunity to address this information complexity via graph-driven architectures, since GCNs can perform feature aggregation, interaction, and reasoning with remarkable flexibility and efficiency. These GCNs capabilities have spawned a new wave of research in medical imaging analysis with the overarching goal of improving quantitative disease understanding, monitoring, and diagnosis. Yet daunting challenges remain for designing the important image-to-graph transformation for multi-modality medical imaging and gaining insights into model interpretation and enhanced clinical decision support. In this review, we present recent GCNs developments in the context of medical image analysis including imaging data from radiology and histopathology. We discuss the fast-growing use of graph network architectures in medical image analysis to improve disease diagnosis and patient outcomes in clinical practice. To foster cross-disciplinary research, we present GCNs technical advancements, emerging medical applications, identify common challenges in the use of image-based GCNs and their extensions in model interpretation, large-scale benchmarks that promise to transform the scope of medical image studies and related graph-driven medical research.  ( 2 min )
    Physical Modeling using Recurrent Neural Networks with Fast Convolutional Layers. (arXiv:2204.10125v1 [cs.SD])
    Discrete-time modeling of acoustic, mechanical and electrical systems is a prominent topic in the musical signal processing literature. Such models are mostly derived by discretizing a mathematical model, given in terms of ordinary or partial differential equations, using established techniques. Recent work has applied the techniques of machine-learning to construct such models automatically from data for the case of systems which have lumped states described by scalar values, such as electrical circuits. In this work, we examine how similar techniques are able to construct models of systems which have spatially distributed rather than lumped states. We describe several novel recurrent neural network structures, and show how they can be thought of as an extension of modal techniques. As a proof of concept, we generate synthetic data for three physical systems and show that the proposed network structures can be trained with this data to reproduce the behavior of these systems.
    OCTOPUS -- optical coherence tomography plaque and stent analysis software. (arXiv:2204.10212v1 [eess.IV])
    Compared with other imaging modalities, intravascular optical coherence tomography (IVOCT) has significant advantages for guiding percutaneous coronary interventions. To aid IVOCT research studies, we developed the Optical Coherence TOmography PlaqUe and Stent (OCTOPUS) analysis software. To automate image analysis results, the software includes several important algorithmic steps: pre-processing, deep learning plaque segmentation, machine learning identification of stent struts, and registration of pullbacks. Interactive visualization and manual editing of segmentations were included in the software. Quantifications include stent deployment characteristics (e.g., stent strut malapposition), strut level analysis, calcium angle, and calcium thickness measurements. Interactive visualizations include (x,y) anatomical, en face, and longitudinal views with optional overlays. Underlying plaque segmentation algorithm yielded excellent pixel-wise results (86.2% sensitivity and 0.781 F1 score). Using OCTOPUS on 34 new pullbacks, we determined that following automated segmentation, only 13% and 23% of frames needed any manual touch up for detailed lumen and calcification labeling, respectively. Only up to 3.8% of plaque pixels were modified, leading to an average editing time of only 7.5 seconds/frame, an approximately 80% reduction compared to manual analysis. Regarding stent analysis, sensitivity and precision were both greater than 90%, and each strut was successfully classified as either covered or uncovered with high sensitivity (94%) and specificity (90%). We introduced and evaluated the clinical application of a highly automated software package, OCTOPUS, for quantitative plaque and stent analysis in IVOCT images. The software is currently used as an offline tool for research purposes; however, the software's embedded algorithms may also be useful for real-time treatment planning.  ( 2 min )
    Sketch2PQ: Freeform Planar Quadrilateral Mesh Design via a Single Sketch. (arXiv:2201.09367v3 [cs.GR] UPDATED)
    The freeform architectural modeling process often involves two important stages: concept design and digital modeling. In the first stage, architects usually sketch the overall 3D shape and the panel layout on a physical or digital paper briefly. In the second stage, a digital 3D model is created using the sketch as a reference. The digital model needs to incorporate geometric requirements for its components, such as the planarity of panels due to consideration of construction costs, which can make the modeling process more challenging. In this work, we present a novel sketch-based system to bridge the concept design and digital modeling of freeform roof-like shapes represented as planar quadrilateral (PQ) meshes. Our system allows the user to sketch the surface boundary and contour lines under axonometric projection and supports the sketching of occluded regions. In addition, the user can sketch feature lines to provide directional guidance to the PQ mesh layout. Given the 2D sketch input, we propose a deep neural network to infer in real-time the underlying surface shape along with a dense conjugate direction field, both of which are used to extract the final PQ mesh. To train and validate our network, we generate a large synthetic dataset that mimics architect sketching of freeform quadrilateral patches. The effectiveness and usability of our system are demonstrated with quantitative and qualitative evaluation as well as user studies.  ( 2 min )
    Assessing Machine Learning Algorithms for Near-Real Time Bus Ridership Prediction During Extreme Weather. (arXiv:2204.09792v1 [stat.AP])
    Given an increasingly volatile climate, the relationship between weather and transit ridership has drawn increasing interest. However, challenges stemming from spatio-temporal dependency and non-stationarity have not been fully addressed in modelling and predicting transit ridership under the influence of weather conditions especially with the traditional statistical approaches. Drawing on three-month smart card data in Brisbane, Australia, this research adopts and assesses a suite of machine-learning algorithms, i.e., random forest, eXtreme Gradient Boosting (XGBoost) and Tweedie XGBoost, to model and predict near real-time bus ridership in relation to sudden change of weather conditions. The study confirms that there indeed exists a significant level of spatio-temporal variability of weather-ridership relationship, which produces equally dynamic patterns of prediction errors. Further comparison of model performance suggests that Tweedie XGBoost outperforms the other two machine-learning algorithms in generating overall more accurate prediction outcomes in space and time. Future research may advance the current study by drawing on larger data sets and applying more advanced machine and deep-learning approaches to provide more enhanced evidence for real-time operation of transit systems.  ( 2 min )
    The 2021 NIST Speaker Recognition Evaluation. (arXiv:2204.10242v1 [eess.AS])
    The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of the ongoing evaluation series conducted by the U.S. National Institute of Standards and Technology (NIST) since 1996. It was the second large-scale multimodal speaker/person recognition evaluation organized by NIST (the first one being SRE19). Similar to SRE19, it featured two core evaluation tracks, namely audio and audio-visual, as well as an optional visual track. In addition to offering fixed and open training conditions, it also introduced new challenges for the community, thanks to a new multimodal (i.e., audio, video, and selfie images) and multilingual (i.e., with multilingual speakers) corpus, termed WeCanTalk, collected outside North America by the Linguistic Data Consortium (LDC). These challenges included: 1) trials (target and non-target) with enrollment and test segments originating from different domains (i.e., telephony versus video), and 2) trials (target and non-target) with enrollment and test segments spoken in different languages (i.e., cross-lingual trials). This paper presents an overview of SRE21 including the tasks, performance metric, data, evaluation protocol, results and system performance analyses. A total of 23 organizations (forming 15 teams) from academia and industry participated in SRE21 and submitted 158 valid system outputs. Evaluation results indicate: audio-visual fusion produce substantial gains in performance over audio-only or visual-only systems; top performing speaker and face recognition systems exhibited comparable performance under the matched domain conditions present in this evaluation; and, the use of complex neural network architectures (e.g., ResNet) along with angular losses with margin, data augmentation, as well as long duration fine-tuning contributed to notable performance improvements for the audio-only speaker recognition task.  ( 2 min )
    Radio Galaxy Zoo: Using semi-supervised learning to leverage large unlabelled data-sets for radio galaxy classification under data-set shift. (arXiv:2204.08816v3 [astro-ph.GA] UPDATED)
    In this work we examine the classification accuracy and robustness of a state-of-the-art semi-supervised learning (SSL) algorithm applied to the morphological classification of radio galaxies. We test if SSL with fewer labels can achieve test accuracies comparable to the supervised state-of-the-art and whether this holds when incorporating previously unseen data. We find that for the radio galaxy classification problem considered, SSL provides additional regularisation and outperforms the baseline test accuracy. However, in contrast to model performance metrics reported on computer science benchmarking data-sets, we find that improvement is limited to a narrow range of label volumes, with performance falling off rapidly at low label volumes. Additionally, we show that SSL does not improve model calibration, regardless of whether classification is improved. Moreover, we find that when different underlying catalogues drawn from the same radio survey are used to provide the labelled and unlabelled data-sets required for SSL, a significant drop in classification performance is observered, highlighting the difficulty of applying SSL techniques under dataset shift. We show that a class-imbalanced unlabelled data pool negatively affects performance through prior probability shift, which we suggest may explain this performance drop, and that using the Frechet Distance between labelled and unlabelled data-sets as a measure of data-set shift can provide a prediction of model performance, but that for typical radio galaxy data-sets with labelled sample volumes of O(1000), the sample variance associated with this technique is high and the technique is in general not sufficiently robust to replace a train-test cycle.  ( 2 min )
    Linear convergence of a policy gradient method for finite horizon continuous time stochastic control problems. (arXiv:2203.11758v2 [math.OC] UPDATED)
    Despite its popularity in the reinforcement learning community, a provably convergent policy gradient method for general continuous space-time stochastic control problems has been elusive. This paper closes the gap by proposing a proximal gradient algorithm for feedback controls of finite-time horizon stochastic control problems. The state dynamics are continuous time nonlinear diffusions with controlled drift and possibly degenerate noise, and the objectives are nonconvex in the state and nonsmooth in the control. We prove under suitable conditions that the algorithm converges linearly to a stationary point of the control problem, and is stable with respect to policy updates by approximate gradient steps. The convergence result justifies the recent reinforcement learning heuristics that adding entropy regularization or a fictitious discount factor to the optimization objective accelerates the convergence of policy gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations.  ( 2 min )
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.
    Fink: early supernovae Ia classification using active learning. (arXiv:2111.11438v2 [astro-ph.IM] UPDATED)
    We describe how the Fink broker early supernova Ia classifier optimizes its ML classifications by employing an active learning (AL) strategy. We demonstrate the feasibility of implementation of such strategies in the current Zwicky Transient Facility (ZTF) public alert data stream. We compare the performance of two AL strategies: uncertainty sampling and random sampling. Our pipeline consists of 3 stages: feature extraction, classification and learning strategy. Starting from an initial sample of 10 alerts (5 SN Ia and 5 non-Ia), we let the algorithm identify which alert should be added to the training sample. The system is allowed to evolve through 300 iterations. Our data set consists of 23 840 alerts from the ZTF with confirmed classification via cross-match with SIMBAD database and the Transient name server (TNS), 1 600 of which were SNe Ia (1 021 unique objects). The data configuration, after the learning cycle was completed, consists of 310 alerts for training and 23 530 for testing. Averaging over 100 realizations, the classifier achieved 89% purity and 54% efficiency. From 01/November/2020 to 31/October/2021 Fink has applied its early supernova Ia module to the ZTF stream and communicated promising SN Ia candidates to the TNS. From the 535 spectroscopically classified Fink candidates, 459 (86%) were proven to be SNe Ia. Our results confirm the effectiveness of active learning strategies for guiding the construction of optimal training samples for astronomical classifiers. It demonstrates in real data that the performance of learning algorithms can be highly improved without the need of extra computational resources or overwhelmingly large training samples. This is, to our knowledge, the first application of AL to real alerts data.
    Path sampling of recurrent neural networks by incorporating known physics. (arXiv:2203.00597v2 [cond-mat.dis-nn] UPDATED)
    Recurrent neural networks have seen widespread use in modeling dynamical systems in varied domains such as weather prediction, text prediction and several others. Often one wishes to supplement the experimentally observed dynamics with prior knowledge or intuition about the system. While the recurrent nature of these networks allows them to model arbitrarily long memories in the time series used in training, it makes it harder to impose prior knowledge or intuition through generic constraints. In this work, we present a path sampling approach based on principle of Maximum Caliber that allows us to include generic thermodynamic or kinetic constraints into recurrent neural networks. We show the method here for a widely used type of recurrent neural network known as long short-term memory network in the context of supplementing time series collected from different application domains. These include classical Molecular Dynamics of a protein and Monte Carlo simulations of an open quantum system continuously losing photons to the environment and displaying Rabi oscillations. Our method can be easily generalized to other generative artificial intelligence models and to generic time series in different areas of physical and social sciences, where one wishes to supplement limited data with intuition or theory based corrections.
    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.
    An Improved Transfer Model: Randomized Transferable Machine. (arXiv:2011.13629v2 [cs.LG] UPDATED)
    Feature-based transfer is one of the most effective methodologies for transfer learning. Existing studies usually assume that the learned new feature representation is \emph{domain-invariant}, and thus train a transfer model $\mathcal{M}$ on the source domain. In this paper, we consider a more realistic scenario where the new feature representation is suboptimal and small divergence still exists across domains. We propose a new transfer model called Randomized Transferable Machine (RTM) to handle such small divergence of domains. Specifically, we work on the new source and target data learned from existing feature-based transfer methods. The key idea is to enlarge source training data populations by randomly corrupting the new source data using some noises, and then train a transfer model $\widetilde{\mathcal{M}}$ that performs well on all the corrupted source data populations. In principle, the more corruptions are made, the higher the probability of the new target data can be covered by the constructed source data populations, and thus better transfer performance can be achieved by $\widetilde{\mathcal{M}}$. An ideal case is with infinite corruptions, which however is infeasible in reality. We develop a marginalized solution that enables to train an $\widetilde{\mathcal{M}}$ without conducting any corruption but equivalent to be trained using infinite source noisy data populations. We further propose two instantiations of $\widetilde{\mathcal{M}}$, which theoretically show the transfer superiority over the conventional transfer model $\mathcal{M}$. More importantly, both instantiations have closed-form solutions, leading to a fast and efficient training process. Experiments on various real-world transfer tasks show that RTM is a promising transfer model.
    Towards Deepening Graph Neural Networks: A GNTK-based Optimization Perspective. (arXiv:2103.03113v3 [cs.LG] UPDATED)
    Graph convolutional networks (GCNs) and their variants have achieved great success in dealing with graph-structured data. Nevertheless, it is well known that deep GCNs suffer from the over-smoothing problem, where node representations tend to be indistinguishable as more layers are stacked up. The theoretical research to date on deep GCNs has focused primarily on expressive power rather than trainability, an optimization perspective. Compared to expressivity, trainability attempts to address a more fundamental question: Given a sufficiently expressive space of models, can we successfully find a good solution via gradient descent-based optimizers? This work fills this gap by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide GCNs. We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. Additionally, we extend our theoretical framework to analyze residual connection-based techniques, which are found to be merely able to mitigate the exponential decay of trainability mildly. Inspired by our theoretical insights on trainability, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, to alleviate the exponential decay problem more fundamentally. Experimental evaluation consistently confirms using our proposed method can achieve better results compared to relevant counterparts with both infinite-width and finite-width.
    Hybrid Memoised Wake-Sleep: Approximate Inference at the Discrete-Continuous Interface. (arXiv:2107.06393v2 [cs.CV] UPDATED)
    Modeling complex phenomena typically involves the use of both discrete and continuous variables. Such a setting applies across a wide range of problems, from identifying trends in time-series data to performing effective compositional scene understanding in images. Here, we propose Hybrid Memoised Wake-Sleep (HMWS), an algorithm for effective inference in such hybrid discrete-continuous models. Prior approaches to learning suffer as they need to perform repeated expensive inner-loop discrete inference. We build on a recent approach, Memoised Wake-Sleep (MWS), which alleviates part of the problem by memoising discrete variables, and extend it to allow for a principled and effective way to handle continuous variables by learning a separate recognition model used for importance-sampling based approximate inference and marginalization. We evaluate HMWS in the GP-kernel learning and 3D scene understanding domains, and show that it outperforms current state-of-the-art inference methods.
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Anti-Jamming Games in Multi-Band Wireless Ad Hoc Networks. (arXiv:2111.11178v2 [cs.IT] UPDATED)
    For multi-band wireless ad hoc networks of multiple users, an anti-jamming game between the users and a jammer is studied. In this game, the users (resp. jammer) want to maximize (resp. minimize) the expected rewards of the users taking into account various factors such as communication rate, hopping cost, and jamming loss. We analyze the arms race of the game and derive an optimal frequency hopping policy at each stage of the arms race based on the Markov decision process (MDP). It is analytically shown that the arms race reaches an equilibrium after a few rounds, and a frequency hopping policy and a jamming strategy at the equilibrium are characterized. We propose two kinds of collision avoidance protocols to ensure that at most one user communicates in each frequency band, and provide various numerical results that show the effects of the reward parameters and collision avoidance protocols on the optimal frequency hopping policy and the expected rewards at the equilibrium. Moreover, we discuss about equilibria for the case where the jammer adopts some unpredictable jamming strategies.
    Distributed Learning for Vehicular Dynamic Spectrum Access in Autonomous Driving. (arXiv:2204.10179v1 [cs.NI])
    Reliable wireless communication between the autonomously driving cars is one of the fundamental needs for guaranteeing passenger safety and comfort. However, when the number of communicating cars increases, the transmission quality may be significantly degraded due to too high occupancy radio of the used frequency band. In this paper, we concentrate on the autonomous vehicle-platooning use-case, where intra-platoon communication is done in the dynamically selected frequency band, other than nominally devoted for such purposes. The carrier selection is done in a flexible manner with the support of the context database located at the roadside unit (edge of wireless communication infrastructure). However, as the database delivers only context information to the platoons' leaders, the final decision is made separately by the individual platoons, following the suggestions made by the artificial intelligence algorithms. In this work, we concentrate on a lightweight Q-learning solution, that could be successfully implemented in each car for dynamic channel selection.
    Merging of neural networks. (arXiv:2204.09973v1 [cs.LG])
    We propose a simple scheme for merging two neural networks trained with different starting initialization into a single one with the same size as the original ones. We do this by carefully selecting channels from each input network. Our procedure might be used as a finalization step after one tries multiple starting seeds to avoid an unlucky one. We also show that training two networks and merging them leads to better performance than training a single network for an extended period of time. Availability: https://github.com/fmfi-compbio/neural-network-merging
    OUR-GAN: One-shot Ultra-high-Resolution Generative Adversarial Networks. (arXiv:2202.13799v2 [cs.CV] UPDATED)
    We propose OUR-GAN, the first one-shot ultra-high-resolution (UHR) image synthesis framework that generates non-repetitive images with 4K or higher resolution from a single training image. OUR-GAN generates a visually coherent image at low resolution and then gradually increases the resolution by super-resolution. Since OUR-GAN learns from a real UHR image, it can synthesize large-scale shapes with fine details while maintaining long-range coherence, which is difficult with conventional generative models that generate large images based on the patch distribution learned from relatively small images. OUR-GAN applies seamless subregion-wise super-resolution that synthesizes 4k or higher UHR images with limited memory, preventing discontinuity at the boundary. Additionally, OUR-GAN improves visual coherence maintaining diversity by adding vertical positional embeddings to the feature maps. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with existing methods. The synthesized images are presented at https://anonymous-62348.github.io.
    A System for Interactive Examination of Learned Security Policies. (arXiv:2204.01126v2 [cs.CR] UPDATED)
    We present a system for interactive examination of learned security policies. It allows a user to traverse episodes of Markov decision processes in a controlled manner and to track the actions triggered by security policies. Similar to a software debugger, a user can continue or or halt an episode at any time step and inspect parameters and probability distributions of interest. The system enables insight into the structure of a given policy and in the behavior of a policy in edge cases. We demonstrate the system with a network intrusion use case. We examine the evolution of an IT infrastructure's state and the actions prescribed by security policies while an attack occurs. The policies for the demonstration have been obtained through a reinforcement learning approach that includes a simulation system where policies are incrementally learned and an emulation system that produces statistics that drive the simulation runs.
    INSPIRE: Distributed Bayesian Optimization for ImproviNg SPatIal REuse in Dense WLANs. (arXiv:2204.10184v1 [cs.NI])
    WLANs, which have overtaken wired networks to become the primary means of connecting devices to the Internet, are prone to performance issues due to the scarcity of space in the radio spectrum. As a response, IEEE 802.11ax and subsequent amendments aim at increasing the spatial reuse of a radio channel by allowing the dynamic update of two key parameters in wireless transmission: the transmission power (TX_POWER) and the sensitivity threshold (OBSS_PD). In this paper, we present INSPIRE, a distributed solution performing local Bayesian optimizations based on Gaussian processes to improve the spatial reuse in WLANs. INSPIRE makes no explicit assumptions about the topology of WLANs and favors altruistic behaviors of the access points, leading them to find adequate configurations of their TX_POWER and OBSS_PD parameters for the "greater good" of the WLANs. We demonstrate the superiority of INSPIRE over other state-of-the-art strategies using the ns-3 simulator and two examples inspired by real-life deployments of dense WLANs. Our results show that, in only a few seconds, INSPIRE is able to drastically increase the quality of service of operational WLANs by improving their fairness and throughput.
    BABD: A Bitcoin Address Behavior Dataset for Address Behavior Pattern Analysis. (arXiv:2204.05746v2 [cs.CR] UPDATED)
    Cryptocurrencies are no longer just the preferred option for cybercriminal activities on darknets, due to the increasing adoption in mainstream applications. This is partly due to the transparency associated with the underpinning ledgers, where any individual can access the record of a transaction record on the public ledger. In this paper, we build a dataset comprising Bitcoin transactions between 12 July 2019 and 26 May 2021. This dataset (hereafter referred to as BABD-13) contains 13 types of Bitcoin addresses, 5 categories of indicators with 148 features, and 544,462 labeled data. We then use our proposed dataset on common machine learning models, namely: k-nearest neighbors algorithm, decision tree, random forest, multilayer perceptron, and XGBoost. The results show that the accuracy rates of these machine learning models on our proposed dataset are between 93.24% and 96.71%. We also analyze the proposed features and their relationships from the experiments, and propose a k-hop subgraph generation algorithm to extract a k-hop subgraph from the entire Bitcoin transaction graph constructed by the directed heterogeneous multigraph starting from a specific Bitcoin address node (e.g., a known transaction associated with a criminal investigation).
    How Well Do Sparse Imagenet Models Transfer?. (arXiv:2111.13445v5 [cs.CV] UPDATED)
    Transfer learning is a classic paradigm by which models pretrained on large "upstream" datasets are adapted to yield good results on "downstream" specialized datasets. Generally, more accurate models on the "upstream" dataset tend to provide better transfer accuracy "downstream". In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned - that is, compressed by sparsifying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, re-growth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods.
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Energy-Efficient Parking Analytics System using Deep Reinforcement Learning. (arXiv:2202.08973v2 [cs.CV] UPDATED)
    Advances in deep vision techniques and ubiquity of smart cameras will drive the next generation of video analytics. However, video analytics applications consume vast amounts of energy as both deep learning techniques and cameras are power-hungry. In this paper, we focus on a parking video analytics platform and propose RL-CamSleep, a deep reinforcement learning-based technique, to actuate the cameras to reduce the energy footprint while retaining the system's utility. Our key insight is that many video-analytics applications do not always need to be operational, and we can design policies to activate video analytics only when necessary. Moreover, our work is complementary to existing work that focuses on improving hardware and software efficiency. We evaluate our approach on a city-scale parking dataset having 76 streets spread across the city. Our analysis demonstrates how streets have various parking patterns, highlighting the importance of an adaptive policy. Our approach can learn such an adaptive policy that can reduce the average energy consumption by 76.38% and achieve an average accuracy of more than 98% in performing video analytics.
    The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization. (arXiv:2110.07732v3 [cs.LG] UPDATED)
    Despite progress across a broad range of applications, Transformers have limited success in systematic generalization. The situation is especially frustrating in the case of algorithmic tasks, where they often fail to find intuitive solutions that route relevant information to the right node/operation at the right time in the grid represented by Transformer columns. To facilitate the learning of useful control flow, we propose two modifications to the Transformer architecture, copy gate and geometric attention. Our novel Neural Data Router (NDR) achieves 100% length generalization accuracy on the classic compositional table lookup task, as well as near-perfect accuracy on the simple arithmetic task and a new variant of ListOps testing for generalization across computational depths. NDR's attention and gating patterns tend to be interpretable as an intuitive form of neural routing. Our code is public.
    On the Certified Robustness for Ensemble Models and Beyond. (arXiv:2107.10873v2 [cs.LG] UPDATED)
    Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs by adding perturbations with small magnitude. To defend against such attacks, both empirical and theoretical defense approaches have been extensively studied for a single ML model. In this work, we aim to analyze and provide the certified robustness for ensemble ML models, together with the sufficient and necessary conditions of robustness for different ensemble protocols. Although ensemble models are shown more robust than a single model empirically; surprisingly, we find that in terms of the certified robustness the standard ensemble models only achieve marginal improvement compared to a single model. Thus, to explore the conditions that guarantee to provide certifiably robust ensemble ML models, we first prove that diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We then provide the bounded model-smoothness analysis based on the proposed Ensemble-before-Smoothing strategy. We also prove that an ensemble model can always achieve higher certified robustness than a single base model under mild conditions. Inspired by the theoretical findings, we propose the lightweight Diversity Regularized Training (DRT) to train certifiably robust ensemble ML models. Extensive experiments show that our DRT enhanced ensembles can consistently achieve higher certified robustness than existing single and ensemble ML models, demonstrating the state-of-the-art certified L2-robustness on MNIST, CIFAR-10, and ImageNet datasets.  ( 2 min )
    A Survey and Perspective on Artificial Intelligence for Security-Aware Electronic Design Automation. (arXiv:2204.09579v2 [cs.LG] UPDATED)
    Artificial intelligence (AI) and machine learning (ML) techniques have been increasingly used in several fields to improve performance and the level of automation. In recent years, this use has exponentially increased due to the advancement of high-performance computing and the ever increasing size of data. One of such fields is that of hardware design; specifically the design of digital and analog integrated circuits~(ICs), where AI/ ML techniques have been extensively used to address ever-increasing design complexity, aggressive time-to-market, and the growing number of ubiquitous interconnected devices (IoT). However, the security concerns and issues related to IC design have been highly overlooked. In this paper, we summarize the state-of-the-art in AL/ML for circuit design/optimization, security and engineering challenges, research in security-aware CAD/EDA, and future research directions and needs for using AI/ML for security-aware circuit design.  ( 2 min )
    ESS: Learning Event-based Semantic Segmentation from Still Images. (arXiv:2203.10016v1 [cs.CV] CROSS LISTED)
    Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the novelty of the sensor, and the lack of high-quality, labeled datasets. In this work, we introduce ESS, which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, to spur further research in event-based semantic segmentation, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.  ( 2 min )
    Modeling and Predicting Popularity Dynamics via Deep Learning Attention Mechanism. (arXiv:1811.02117v2 [cs.SI] UPDATED)
    An ability to predict the popularity dynamics of individual items within a complex evolving system has important implications in a wide range of domains. Here we propose a deep learning attention mechanism to model the process through which individual items gain their popularity. We analyze the interpretability of the model with the four key phenomena confirmed independently in the previous studies of long-term popularity dynamics quantification, including the intrinsic quality, the aging effect, the recency effect and the Matthew effect. We analyze the effectiveness of introducing attention model in popularity dynamics prediction. Extensive experiments on a real-large citation data set demonstrate that the designed deep learning attention mechanism possesses remarkable power at predicting the long-term popularity dynamics. It consistently outperforms the existing methods, and achieves a significant performance improvement.  ( 2 min )
    Deep Bayesian Active Learning, A Brief Survey on Recent Advances. (arXiv:2012.08044v2 [cs.LG] UPDATED)
    Active learning frameworks offer efficient data annotation without remarkable accuracy degradation. In other words, active learning starts training the model with a small size of labeled data while exploring the space of unlabeled data in order to select most informative samples to be labeled. Generally speaking, representing the uncertainty is crucial in any active learning framework, however, deep learning methods are not capable of either representing or manipulating model uncertainty. On the other hand, from the real world application perspective, uncertainty representation is getting more and more attention in the machine learning community. Deep Bayesian active learning frameworks and generally any Bayesian active learning settings, provide practical consideration in the model which allows training with small data while representing the model uncertainty for further efficient training. In this paper, we briefly survey recent advances in Bayesian active learning and in particular deep Bayesian active learning frameworks.
    Addressing Tactic Volatility in Self-Adaptive Systems Using Evolved Recurrent Neural Networks and Uncertainty Reduction Tactics. (arXiv:2204.10308v1 [cs.LG])
    Self-adaptive systems frequently use tactics to perform adaptations. Tactic examples include the implementation of additional security measures when an intrusion is detected, or activating a cooling mechanism when temperature thresholds are surpassed. Tactic volatility occurs in real-world systems and is defined as variable behavior in the attributes of a tactic, such as its latency or cost. A system's inability to effectively account for tactic volatility adversely impacts its efficiency and resiliency against the dynamics of real-world environments. To enable systems' efficiency against tactic volatility, we propose a Tactic Volatility Aware (TVA-E) process utilizing evolved Recurrent Neural Networks (eRNN) to provide accurate tactic predictions. TVA-E is also the first known process to take advantage of uncertainty reduction tactics to provide additional information to the decision-making process and reduce uncertainty. TVA-E easily integrates into popular adaptation processes enabling it to immediately benefit a large number of existing self-adaptive systems. Simulations using 52,106 tactic records demonstrate that: I) eRNN is an effective prediction mechanism, II) TVA-E represents an improvement over existing state-of-the-art processes in accounting for tactic volatility, and III) Uncertainty reduction tactics are beneficial in accounting for tactic volatility. The developed dataset and tool can be found at https://tacticvolatility.github.io/
    Lessons on Parameter Sharing across Layers in Transformers. (arXiv:2104.06022v3 [cs.CL] UPDATED)
    We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al., 2019), to increase the efficiency in the computational time. We propose three strategies: Sequence, Cycle, and Cycle (rev) to assign parameters to each layer. Experimental results show that the proposed strategies are efficient in the parameter size and computational time. Moreover, we indicate that the proposed strategies are also effective in the configuration where we use many training data such as the recent WMT competition.  ( 2 min )
    DropMessage: Unifying Random Dropping for Graph Neural Networks. (arXiv:2204.10037v1 [cs.LG])
    Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also faces some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate noises into models by randomly masking parts of the input. However, some open-ended problems of random dropping on GNNs remain to solve. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, random noises introduced to GNNs cause the incomplete coverage of parameters and unstable training process. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the message matrix and can be applied to any message-passing GNNs. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, which makes it a theoretical upper bound of other methods. Also, we unify existing random dropping methods into our framework and analyze their effects on GNNs. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has both advantages of effectiveness and generalization.  ( 2 min )
    STONet: A Neural-Operator-Driven Spatio-temporal Network. (arXiv:2204.08414v2 [cs.LG] UPDATED)
    Graph-based spatio-temporal neural networks are effective to model the spatial dependency among discrete points sampled irregularly from unstructured grids, thanks to the great expressiveness of graph neural networks. However, these models are usually spatially-transductive -- only fitting the signals for discrete spatial nodes fed in models but unable to generalize to `unseen' spatial points with zero-shot. In comparison, for forecasting tasks on continuous space such as temperature prediction on the earth's surface, the \textit{spatially-inductive} property allows the model to generalize to any point in the spatial domain, demonstrating models' ability to learn the underlying mechanisms or physics laws of the systems, rather than simply fit the signals. Besides, in temporal domains, \textit{irregularly-sampled} time series, e.g. data with missing values, urge models to be temporally-continuous. Motivated by the two issues, we propose a spatio-temporal framework based on neural operators for PDEs, which learn the underlying mechanisms governing the dynamics of spatially-continuous physical quantities. Experiments show our model's improved performance on forecasting spatially-continuous physic quantities, and its superior generalization to unseen spatial points and ability to handle temporally-irregular data.  ( 2 min )
    MRAM-based Analog Sigmoid Function for In-memory Computing. (arXiv:2204.09918v1 [cs.ET])
    We propose an analog implementation of the transcendental activation function leveraging two spin-orbit torque magnetoresistive random-access memory (SOT-MRAM) devices and a CMOS inverter. The proposed analog neuron circuit consumes 1.8-27x less power, and occupies 2.5-4931x smaller area, compared to the state-of-the-art analog and digital implementations. Moreover, the developed neuron can be readily integrated with memristive crossbars without requiring any intermediate signal conversion units. The architecture-level analyses show that a fully-analog in-memory computing (IMC) circuit that use our SOT-MRAM neuron along with an SOT-MRAM based crossbar can achieve more than 1.1x, 12x, and 13.3x reduction in power, latency, and energy, respectively, compared to a mixed-signal implementation with analog memristive crossbars and digital neurons. Finally, through cross-layer analyses, we provide a guide on how varying the device-level parameters in our neuron can affect the accuracy of multilayer perceptron (MLP) for MNIST classification.  ( 2 min )
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.  ( 2 min )
    NetSentry: A Deep Learning Approach to Detecting Incipient Large-scale Network Attacks. (arXiv:2202.09873v2 [cs.CR] UPDATED)
    Machine Learning (ML) techniques are increasingly adopted to tackle ever-evolving high-profile network attacks, including DDoS, botnet, and ransomware, due to their unique ability to extract complex patterns hidden in data streams. These approaches are however routinely validated with data collected in the same environment, and their performance degrades when deployed in different network topologies and/or applied on previously unseen traffic, as we uncover. This suggests malicious/benign behaviors are largely learned superficially and ML-based Network Intrusion Detection System (NIDS) need revisiting, to be effective in practice. In this paper we dive into the mechanics of large-scale network attacks, with a view to understanding how to use ML for Network Intrusion Detection (NID) in a principled way. We reveal that, although cyberattacks vary significantly in terms of payloads, vectors and targets, their early stages, which are critical to successful attack outcomes, share many similarities and exhibit important temporal correlations. Therefore, we treat NID as a time-sensitive task and propose NetSentry, perhaps the first of its kind NIDS that builds on Bidirectional Asymmetric LSTM (Bi-ALSTM), an original ensemble of sequential neural models, to detect network threats before they spread. We cross-evaluate NetSentry using two practical datasets, training on one and testing on the other, and demonstrate F1 score gains above 33% over the state-of-the-art, as well as up to 3 times higher rates of detecting attacks such as XSS and web bruteforce. Further, we put forward a novel data augmentation technique that boosts the generalization abilities of a broad range of supervised deep learning algorithms, leading to average F1 score gains above 35%.  ( 2 min )
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.  ( 2 min )
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.  ( 2 min )
    CNLL: A Semi-supervised Approach For Continual Noisy Label Learning. (arXiv:2204.09881v1 [cs.CV])
    The task of continual learning requires careful design of algorithms that can tackle catastrophic forgetting. However, the noisy label, which is inevitable in a real-world scenario, seems to exacerbate the situation. While very few studies have addressed the issue of continual learning under noisy labels, long training time and complicated training schemes limit their applications in most cases. In contrast, we propose a simple purification technique to effectively cleanse the online data stream that is both cost-effective and more accurate. After purification, we perform fine-tuning in a semi-supervised fashion that ensures the participation of all available samples. Training in this fashion helps us learn a better representation that results in state-of-the-art (SOTA) performance. Through extensive experimentation on 3 benchmark datasets, MNIST, CIFAR10 and CIFAR100, we show the effectiveness of our proposed approach. We achieve a 24.8% performance gain for CIFAR10 with 20% noise over previous SOTA methods. Our code is publicly available.  ( 2 min )
    Holmes: An Efficient and Lightweight Semantic Based Anomalous Email Detector. (arXiv:2104.08044v11 [cs.CR] UPDATED)
    Email threat is a serious issue for enterprise security, which consists of various malicious scenarios, such as phishing, fraud, blackmail and malvertisement. Traditional anti-spam gateway commonly requires to maintain a greylist to filter out unexpected emails based on suspicious vocabularies existed in the mail subject and content. However, the signature-based approach cannot effectively discover novel and unknown suspicious emails that utilize various hot topics at present, such as COVID-19 and US election. To address the problem, in this paper, we present Holmes, an efficient and lightweight semantic based engine for anomalous email detection. Holmes can convert each event log of email to a sentence through word embedding then extract interesting items among them by novelty detection. Based on our observations, we claim that, in an enterprise environment, there is a stable relation between senders and receivers, but suspicious emails are commonly from unusual sources, which can be detected through the rareness selection. We evaluate the performance of Holmes in a real-world enterprise environment, in which it sends and receives around 5,000 emails each day. As a result, Holmes can achieve a high detection rate (output around 200 suspicious emails per day) and maintain a low false alarm rate for anomaly detection.  ( 3 min )
    Neural Topic Modeling of Psychotherapy Sessions. (arXiv:2204.10189v1 [cs.CL])
    In this work, we compare different neural topic modeling methods in learning the topical propensities of different psychiatric conditions from the psychotherapy session transcripts parsed from speech recordings. We also incorporate temporal modeling to put this additional interpretability to action by parsing out topic similarities as a time series in a turn-level resolution. We believe this topic modeling framework can offer interpretable insights for the therapist to optimally decide his or her strategy and improve the psychotherapy effectiveness.  ( 2 min )
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.  ( 2 min )
    Why I'm not Answering: Understanding Determinants of Classification of an Abstaining Classifier for Cancer Pathology Reports. (arXiv:2009.05094v5 [cs.LG] UPDATED)
    Safe deployment of deep learning systems in critical real world applications requires models to make very few mistakes, and only under predictable circumstances. In this work, we address this problem using an abstaining classifier that is tuned to have $>$95% accuracy, and then identify the determinants of abstention using LIME. Essentially, we are training our model to learn the attributes of pathology reports that are likely to lead to incorrect classifications, albeit at the cost of reduced sensitivity. We demonstrate an abstaining classifier in a multitask setting for classifying cancer pathology reports from the NCI SEER cancer registries on six tasks of interest. For these tasks, we reduce the classification error rate by factors of 2--5 by abstaining on 25--45% of the reports. For the specific task of classifying cancer site, we are able to identify metastasis, reports involving lymph nodes, and discussion of multiple cancer sites as responsible for many of the classification mistakes, and observe that the extent and types of mistakes vary systematically with cancer site (e.g., breast, lung, and prostate). When combining across three of the tasks, our model classifies 50% of the reports with an accuracy greater than 95% for three of the six tasks\edit, and greater than 85% for all six tasks on the retained samples. Furthermore, we show that LIME provides a better determinant of classification than measures of word occurrence alone. By combining a deep abstaining classifier with feature identification using LIME, we are able to identify concepts responsible for both correctness and abstention when classifying cancer sites from pathology reports. The improvement of LIME over keyword searches is statistically significant, presumably because words are assessed in context and have been identified as a local determinant of classification.  ( 3 min )
    Learnable Model Augmentation Self-Supervised Learning for Sequential Recommendation. (arXiv:2204.10128v1 [cs.IR])
    Sequential Recommendation aims to predict the next item based on user behaviour. Recently, Self-Supervised Learning (SSL) has been proposed to improve recommendation performance. However, most of existing SSL methods use a uniform data augmentation scheme, which loses the sequence correlation of an original sequence. To this end, in this paper, we propose a Learnable Model Augmentation self-supervised learning for sequential Recommendation (LMA4Rec). Specifically, LMA4Rec first takes model augmentation as a supplementary method for data augmentation to generate views. Then, LMA4Rec uses learnable Bernoulli dropout to implement model augmentation learnable operations. Next, self-supervised learning is used between the contrastive views to extract self-supervised signals from an original sequence. Finally, experiments on three public datasets show that the LMA4Rec method effectively improves sequential recommendation performance compared with baseline methods.  ( 2 min )
    TorchSparse: Efficient Point Cloud Inference Engine. (arXiv:2204.10319v1 [cs.LG])
    Deep learning on point clouds has received increased attention thanks to its wide applications in AR/VR and autonomous driving. These applications require low latency and high accuracy to provide real-time user experience and ensure user safety. Unlike conventional dense workloads, the sparse and irregular nature of point clouds poses severe challenges to running sparse CNNs efficiently on the general-purpose hardware. Furthermore, existing sparse acceleration techniques for 2D images do not translate to 3D point clouds. In this paper, we introduce TorchSparse, a high-performance point cloud inference engine that accelerates the sparse convolution computation on GPUs. TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement. It applies adaptive matrix multiplication grouping to trade computation for better regularity, achieving 1.4-1.5x speedup for matrix multiplication. It also optimizes the data movement by adopting vectorized, quantized and fused locality-aware memory access, reducing the memory movement cost by 2.7x. Evaluated on seven representative models across three benchmark datasets, TorchSparse achieves 1.6x and 1.5x measured end-to-end speedup over the state-of-the-art MinkowskiEngine and SpConv, respectively.  ( 2 min )
    Learning Future Object Prediction with a Spatiotemporal Detection Transformer. (arXiv:2204.10321v1 [cs.CV])
    We explore future object prediction -- a challenging problem where all objects visible in a future video frame are to be predicted. We propose to tackle this problem end-to-end by training a detection transformer to directly output future objects. In order to make accurate predictions about the future, it is necessary to capture the dynamics in the scene, both of other objects and of the ego-camera. We extend existing detection transformers in two ways to capture the scene dynamics. First, we experiment with three different mechanisms that enable the model to spatiotemporally process multiple frames. Second, we feed ego-motion information to the model via cross-attention. We show that both of these cues substantially improve future object prediction performance. Our final approach learns to capture the dynamics and make predictions on par with an oracle for 100 ms prediction horizons, and outperform baselines for longer prediction horizons.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Multi-Component Optimization and Efficient Deployment of Neural-Networks on Resource-Constrained IoT Hardware. (arXiv:2204.10183v1 [cs.LG])
    The majority of IoT devices like smartwatches, smart plugs, HVAC controllers, etc., are powered by hardware with a constrained specification (low memory, clock speed and processor) which is insufficient to accommodate and execute large, high-quality models. On such resource-constrained devices, manufacturers still manage to provide attractive functionalities (to boost sales) by following the traditional approach of programming IoT devices/products to collect and transmit data (image, audio, sensor readings, etc.) to their cloud-based ML analytics platforms. For decades, this online approach has been facing issues such as compromised data streams, non-real-time analytics due to latency, bandwidth constraints, costly subscriptions, recent privacy issues raised by users and the GDPR guidelines, etc. In this paper, to enable ultra-fast and accurate AI-based offline analytics on resource-constrained IoT devices, we present an end-to-end multi-component model optimization sequence and open-source its implementation. Researchers and developers can use our optimization sequence to optimize high memory, computation demanding models in multiple aspects in order to produce small size, low latency, low-power consuming models that can comfortably fit and execute on resource-constrained hardware. The experimental results show that our optimization components can produce models that are; (i) 12.06 x times compressed; (ii) 0.13% to 0.27% more accurate; (iii) Orders of magnitude faster unit inference at 0.06 ms. Our optimization sequence is generic and can be applied to any state-of-the-art models trained for anomaly detection, predictive maintenance, robotics, voice recognition, and machine vision.  ( 2 min )
    A Sandbox Tool to Bias(Stress)-Test Fairness Algorithms. (arXiv:2204.10233v1 [cs.LG])
    Motivated by the growing importance of reducing unfairness in ML predictions, Fair-ML researchers have presented an extensive suite of algorithmic "fairness-enhancing" remedies. Most existing algorithms, however, are agnostic to the sources of the observed unfairness. As a result, the literature currently lacks guiding frameworks to specify conditions under which each algorithmic intervention can potentially alleviate the underpinning cause of unfairness. To close this gap, we scrutinize the underlying biases (e.g., in the training data or design choices) that cause observational unfairness. We present a bias-injection sandbox tool to investigate fairness consequences of various biases and assess the effectiveness of algorithmic remedies in the presence of specific types of bias. We call this process the bias(stress)-testing of algorithmic interventions. Unlike existing toolkits, ours provides a controlled environment to counterfactually inject biases in the ML pipeline. This stylized setup offers the distinct capability of testing fairness interventions beyond observational data and against an unbiased benchmark. In particular, we can test whether a given remedy can alleviate the injected bias by comparing the predictions resulting after the intervention in the biased setting with true labels in the unbiased regime -- that is, before any bias injection. We illustrate the utility of our toolkit via a proof-of-concept case study on synthetic data. Our empirical analysis showcases the type of insights that can be obtained through our simulations.  ( 2 min )
    Feature anomaly detection system (FADS) for intelligent manufacturing. (arXiv:2204.10318v1 [cs.CV])
    Anomaly detection is important for industrial automation and part quality assurance, and while humans can easily detect anomalies in components given a few examples, designing a generic automated system that can perform at human or above human capabilities remains a challenge. In this work, we present a simple new anomaly detection algorithm called FADS (feature-based anomaly detection system) which leverages pretrained convolutional neural networks (CNN) to generate a statistical model of nominal inputs by observing the activation of the convolutional filters. During inference the system compares the convolutional filter activation of the new input to the statistical model and flags activations that are outside the expected range of values and therefore likely an anomaly. By using a pretrained network, FADS demonstrates excellent performance similar to or better than other machine learning approaches to anomaly detection while at the same time FADS requires no tuning of the CNN weights. We demonstrate FADS ability by detecting process parameter changes on a custom dataset of additively manufactured lattices. The FADS localization algorithm shows that textural differences that are visible on the surface can be used to detect process parameter changes. In addition, we test FADS on benchmark datasets, such as the MVTec Anomaly Detection dataset, and report good results.  ( 2 min )
    A two-level machine learning framework for predictive maintenance: comparison of learning formulations. (arXiv:2204.10083v1 [cs.LG])
    Predicting incoming failures and scheduling maintenance based on sensors information in industrial machines is increasingly important to avoid downtime and machine failure. Different machine learning formulations can be used to solve the predictive maintenance problem. However, many of the approaches studied in the literature are not directly applicable to real-life scenarios. Indeed, many of those approaches usually either rely on labelled machine malfunctions in the case of classification and fault detection, or rely on finding a monotonic health indicator on which a prediction can be made in the case of regression and remaining useful life estimation, which is not always feasible. Moreover, the decision-making part of the problem is not always studied in conjunction with the prediction phase. This paper aims to design and compare different formulations for predictive maintenance in a two-level framework and design metrics that quantify both the failure detection performance as well as the timing of the maintenance decision. The first level is responsible for building a health indicator by aggregating features using a learning algorithm. The second level consists of a decision-making system that can trigger an alarm based on this health indicator. Three degrees of refinements are compared in the first level of the framework, from simple threshold-based univariate predictive technique to supervised learning methods based on the remaining time before failure. We choose to use the Support Vector Machine (SVM) and its variations as the common algorithm used in all the formulations. We apply and compare the different strategies on a real-world rotating machine case study and observe that while a simple model can already perform well, more sophisticated refinements enhance the predictions for well-chosen parameters.  ( 2 min )
    Learning spatiotemporal features from incomplete data for traffic flow prediction using hybrid deep neural networks. (arXiv:2204.10222v1 [cs.LG])
    Urban traffic flow prediction using data-driven models can play an important role in route planning and preventing congestion on highways. These methods utilize data collected from traffic recording stations at different timestamps to predict the future status of traffic. Hence, data collection, transmission, storage, and extraction techniques can have a significant impact on the performance of the traffic flow model. On the other hand, a comprehensive database can provide the opportunity for using complex, yet reliable predictive models such as deep learning methods. However, most of these methods have difficulties in handling missing values and outliers. This study focuses on hybrid deep neural networks to predict traffic flow in the California Freeway Performance Measurement System (PeMS) with missing values. The proposed networks are based on a combination of recurrent neural networks (RNNs) to consider the temporal dependencies in the data recorded in each station and convolutional neural networks (CNNs) to take the spatial correlations in the adjacent stations into account. Various architecture configurations with series and parallel connections are considered based on RNNs and CNNs, and several prevalent data imputation techniques are used to examine the robustness of the hybrid networks to missing values. A comprehensive analysis performed on two different datasets from PeMS indicates that the proposed series-parallel hybrid network with the mean imputation technique achieves the lowest error in predicting the traffic flow and is robust to missing values up until 21% missing ratio in both complete and incomplete training data scenarios when applied to an incomplete test data.  ( 2 min )
    Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries. (arXiv:2204.10162v1 [cs.LG])
    Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Revisiting Gaussian mixture critic in off-policy reinforcement learning: a sample-based approach. (arXiv:2204.10256v1 [cs.LG])
    Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks. Examples of this behavior include the D4PG and DMPO algorithms as compared to DDPG and MPO, respectively [Barth-Maron et al., 2018; Hoffman et al., 2020]. However, both agents rely on the C51 critic for value estimation.One major drawback of the C51 approach is its requirement of prior knowledge about the minimum andmaximum values a policy can attain as well as the number of bins used, which fixes the resolution ofthe distributional estimate. While the DeepMind control suite of tasks utilizes standardized rewards and episode lengths, thus enabling the entire suite to be solved with a single setting of these hyperparameters, this is often not the case. This paper revisits a natural alternative that removes this requirement, namelya mixture of Gaussians, and a simple sample-based loss function to train it in an off-policy regime. We empirically evaluate its performance on a broad range of continuous control tasks and demonstrate that it eliminates the need for these distributional hyperparameters and achieves state-of-the-art performance on a variety of challenging tasks (e.g. the humanoid, dog, quadruped, and manipulator domains). Finallywe provide an implementation in the Acme agent repository.  ( 2 min )
    IIITDWD-ShankarB@ Dravidian-CodeMixi-HASOC2021: mBERT based model for identification of offensive content in south Indian languages. (arXiv:2204.10195v1 [cs.CL])
    In recent years, there has been a lot of focus on offensive content. The amount of offensive content generated by social media is increasing at an alarming rate. This created a greater need to address this issue than ever before. To address these issues, the organizers of "Dravidian-Code Mixed HASOC-2020" have created two challenges. Task 1 involves identifying offensive content in Malayalam data, whereas Task 2 includes Malayalam and Tamil Code Mixed Sentences. Our team participated in Task 2. In our suggested model, we experiment with multilingual BERT to extract features, and three different classifiers are used on extracted features. Our model received a weighted F1 score of 0.70 for Malayalam data and was ranked fifth; we also received a weighted F1 score of 0.573 for Tamil Code Mixed data and were ranked eleventh.  ( 2 min )
    Unsupervised Numerical Reasoning to Extract Phenotypes from Clinical Text by Leveraging External Knowledge. (arXiv:2204.10202v1 [cs.CL])
    Extracting phenotypes from clinical text has been shown to be useful for a variety of clinical use cases such as identifying patients with rare diseases. However, reasoning with numerical values remains challenging for phenotyping in clinical text, for example, temperature 102F representing Fever. Current state-of-the-art phenotyping models are able to detect general phenotypes, but perform poorly when they detect phenotypes requiring numerical reasoning. We present a novel unsupervised methodology leveraging external knowledge and contextualized word embeddings from ClinicalBERT for numerical reasoning in a variety of phenotypic contexts. Comparing against unsupervised benchmarks, it shows a substantial performance improvement with absolute gains on generalized Recall and F1 scores up to 79% and 71%, respectively. In the supervised setting, it also surpasses the performance of alternative approaches with absolute gains on generalized Recall and F1 scores up to 70% and 44%, respectively.  ( 2 min )
    Is Neuron Coverage Needed to Make Person Detection More Robust?. (arXiv:2204.10027v1 [cs.CV])
    The growing use of deep neural networks (DNNs) in safety- and security-critical areas like autonomous driving raises the need for their systematic testing. Coverage-guided testing (CGT) is an approach that applies mutation or fuzzing according to a predefined coverage metric to find inputs that cause misbehavior. With the introduction of a neuron coverage metric, CGT has also recently been applied to DNNs. In this work, we apply CGT to the task of person detection in crowded scenes. The proposed pipeline uses YOLOv3 for person detection and includes finding DNN bugs via sampling and mutation, and subsequent DNN retraining on the updated training set. To be a bug, we require a mutated image to cause a significant performance drop compared to a clean input. In accordance with the CGT, we also consider an additional requirement of increased coverage in the bug definition. In order to explore several types of robustness, our approach includes natural image transformations, corruptions, and adversarial examples generated with the Daedalus attack. The proposed framework has uncovered several thousand cases of incorrect DNN behavior. The relative change in mAP performance of the retrained models reached on average between 26.21\% and 64.24\% for different robustness types. However, we have found no evidence that the investigated coverage metrics can be advantageously used to improve robustness.  ( 2 min )
    Evolution and use of data science vocabulary. How much have we changed in 13 years?. (arXiv:2204.10174v1 [cs.DL])
    Here I present an investigation on the evolution and use of vocabulary in data science in the last 13 years. Based on a rigorous statistical analysis, a database with 12,787 documents containing the words "data science" in the title, abstract or keywords is analyzed. It is proposed to classify the evolution of this discipline in three periods: emergence, growth and boom. Characteristic words and pioneering documents are identified for each period. By proposing the distinctive vocabulary and relevant topics of data science and classified in time periods, these results add value to the scientific community of this discipline.  ( 2 min )
    Detecting Topology Attacks against Graph Neural Networks. (arXiv:2204.10072v1 [cs.LG])
    Graph neural networks (GNNs) have been widely used in many real applications, and recent studies have revealed their vulnerabilities against topology attacks. To address this issue, existing efforts have mainly been dedicated to improving the robustness of GNNs, while little attention has been paid to the detection of such attacks. In this work, we study the victim node detection problem under topology attacks against GNNs. Our approach is built upon the key observation rooted in the intrinsic message passing nature of GNNs. That is, the neighborhood of a victim node tends to have two competing group forces, pushing the node classification results towards the original label and the targeted label, respectively. Based on this observation, we propose to detect victim nodes by deliberately designing an effective measurement of the neighborhood variance for each node. Extensive experimental results on four real-world datasets and five existing topology attacks show the effectiveness and efficiency of the proposed detection approach.  ( 2 min )
    Working memory inspired hierarchical video decomposition with transformative representations. (arXiv:2204.10105v1 [cs.CV])
    Video decomposition is very important to extract moving foreground objects from complex backgrounds in computer vision, machine learning, and medical imaging, e.g., extracting moving contrast-filled vessels from the complex and noisy backgrounds of X-ray coronary angiography (XCA). However, the challenges caused by dynamic backgrounds, overlapping heterogeneous environments and complex noises still exist in video decomposition. To solve these problems, this study is the first to introduce a flexible visual working memory model in video decomposition tasks to provide interpretable and high-performance hierarchical deep architecture, integrating the transformative representations between sensory and control layers from the perspective of visual and cognitive neuroscience. Specifically, robust PCA unrolling networks acting as a structure-regularized sensor layer decompose XCA into sparse/low-rank structured representations to separate moving contrast-filled vessels from noisy and complex backgrounds. Then, patch recurrent convolutional LSTM networks with a backprojection module embody unstructured random representations of the control layer in working memory, recurrently projecting spatiotemporally decomposed nonlocal patches into orthogonal subspaces for heterogeneous vessel retrieval and interference suppression. This video decomposition deep architecture effectively restores the heterogeneous profiles of intensity and the geometries of moving objects against the complex background interferences. Experiments show that the proposed method significantly outperforms state-of-the-art methods in accurate moving contrast-filled vessel extraction with excellent flexibility and computational efficiency.  ( 2 min )
    Robustness of Machine Learning Models Beyond Adversarial Attacks. (arXiv:2204.10046v1 [cs.LG])
    Correctly quantifying the robustness of machine learning models is a central aspect in judging their suitability for specific tasks, and thus, ultimately, for generating trust in the models. We show that the widely used concept of adversarial robustness and closely related metrics based on counterfactuals are not necessarily valid metrics for determining the robustness of ML models against perturbations that occur "naturally", outside specific adversarial attack scenarios. Additionally, we argue that generic robustness metrics in principle are insufficient for determining real-world-robustness. Instead we propose a flexible approach that models possible perturbations in input data individually for each application. This is then combined with a probabilistic approach that computes the likelihood that a real-world perturbation will change a prediction, thus giving quantitative information of the robustness of the trained machine learning model. The method does not require access to the internals of the classifier and thus in principle works for any black-box model. It is, however, based on Monte-Carlo sampling and thus only suited for input spaces with small dimensions. We illustrate our approach on two dataset, as well as on analytically solvable cases. Finally, we discuss ideas on how real-world robustness could be computed or estimated in high-dimensional input spaces.  ( 2 min )
    Fluctuation-based Outlier Detection. (arXiv:2204.10007v1 [cs.LG])
    Outlier detection is an important topic in machine learning and has been used in a wide range of applications. Outliers are objects that are few in number and deviate from the majority of objects. As a result of these two properties, we show that outliers are susceptible to a mechanism called fluctuation. This article proposes a method called fluctuation-based outlier detection (FBOD) that achieves a low linear time complexity and detects outliers purely based on the concept of fluctuation without employing any distance, density or isolation measure. Fundamentally different from all existing methods. FBOD first converts the Euclidean structure datasets into graphs by using random links, then propagates the feature value according to the connection of the graph. Finally, by comparing the difference between the fluctuation of an object and its neighbors, FBOD determines the object with a larger difference as an outlier. The results of experiments comparing FBOD with seven state-of-the-art algorithms on eight real-world tabular datasets and three video datasets show that FBOD outperforms its competitors in the majority of cases and that FBOD has only 5% of the execution time of the fastest algorithm. The experiment codes are available at: https://github.com/FluctuationOD/Fluctuation-based-Outlier-Detection.  ( 2 min )
    Multi-Tier Platform for Cognizing Massive Electroencephalogram. (arXiv:2204.09840v1 [eess.SP])
    An end-to-end platform assembling multiple tiers is built for precisely cognizing brain activities. Being fed massive electroencephalogram (EEG) data, the time-frequency spectrograms are conventionally projected into the episode-wise feature matrices (seen as tier-1). A spiking neural network (SNN) based tier is designed to distill the principle information in terms of spike-streams from the rare features, which maintains the temporal implication in the nature of EEGs. The proposed tier-3 transposes time- and space-domain of spike patterns from the SNN; and feeds the transposed pattern-matrices into an artificial neural network (ANN, Transformer specifically) known as tier-4, where a special spanning topology is proposed to match the two-dimensional input form. In this manner, cognition such as classification is conducted with high accuracy. For proof-of-concept, the sleep stage scoring problem is demonstrated by introducing multiple EEG datasets with the largest comprising 42,560 hours recorded from 5,793 subjects. From experiment results, our platform achieves the general cognition overall accuracy of 87% by leveraging sole EEG, which is 2% superior to the state-of-the-art. Moreover, our developed multi-tier methodology offers visible and graphical interpretations of the temporal characteristics of EEG by identifying the critical episodes, which is demanded in neurodynamics but hardly appears in conventional cognition scenarios.  ( 2 min )
    A data filling methodology for time series based on CNN and (Bi)LSTM neural networks. (arXiv:2204.09994v1 [cs.LG])
    In the process of collecting data from sensors, several circumstances can affect their continuity and validity, resulting in alterations of the data or loss of information. Although classical methods of statistics, such as interpolation-like techniques, can be used to approximate the missing data in a time series, the recent developments in Deep Learning (DL) have given impetus to innovative and much more accurate forecasting techniques. In the present paper, we develop two DL models aimed at filling data gaps, for the specific case of internal temperature time series obtained from monitored apartments located in Bolzano, Italy. The DL models developed in the present work are based on the combination of Convolutional Neural Networks (CNNs), Long Short-Term Memory Neural Networks (LSTMs), and Bidirectional LSTMs (BiLSTMs). Two key features of our models are the use of both pre- and post-gap data, and the exploitation of a correlated time series (the external temperature) in order to predict the target one (the internal temperature). Our approach manages to capture the fluctuating nature of the data and shows good accuracy in reconstructing the target time series. In addition, our models significantly improve the already good results from another DL architecture that is used as a baseline for the present work.  ( 2 min )
    A Learned Index for Exact Similarity Search in Metric Spaces. (arXiv:2204.10028v1 [cs.DB])
    Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index has been explored actively to replace or supplement traditional index structures with machine learning models to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, a novel indexing approach called LIMS is proposed to use data clustering and pivot-based data transformation techniques to build learned indexes for efficient similarity query processing in metric spaces. The underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on the disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.  ( 2 min )
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.  ( 2 min )
    Deep transfer learning for partial differential equations under conditional shift with DeepONet. (arXiv:2204.09810v1 [cs.LG])
    Traditional machine learning algorithms are designed to learn in isolation, i.e. address single tasks. The core idea of transfer learning (TL) is that knowledge gained in learning to perform one task (source) can be leveraged to improve learning performance in a related, but different, task (target). TL leverages and transfers previously acquired knowledge to address the expense of data acquisition and labeling, potential computational power limitations, and the dataset distribution mismatches. Although significant progress has been made in the fields of image processing, speech recognition, and natural language processing (for classification and regression) for TL, little work has been done in the field of scientific machine learning for functional regression and uncertainty quantification in partial differential equations. In this work, we propose a novel TL framework for task-specific learning under conditional shift with a deep operator network (DeepONet). Inspired by the conditional embedding operator theory, we measure the statistical distance between the source domain and the target feature domain by embedding conditional distributions onto a reproducing kernel Hilbert space. Task-specific operator learning is accomplished by fine-tuning task-specific layers of the target DeepONet using a hybrid loss function that allows for the matching of individual target samples while also preserving the global properties of the conditional distribution of target data. We demonstrate the advantages of our approach for various TL scenarios involving nonlinear PDEs under conditional shift. Our results include geometry domain adaptation and show that the proposed TL framework enables fast and efficient multi-task operator learning, despite significant differences between the source and target domains.  ( 2 min )
    Fairness in Graph Mining: A Survey. (arXiv:2204.09888v1 [cs.LG])
    Graph mining algorithms have been playing a significant role in myriad fields over the years. However, despite their promising performance on various graph analytical tasks, most of these algorithms lack fairness considerations. As a consequence, they could lead to discrimination towards certain populations when exploited in human-centered applications. Recently, algorithmic fairness has been extensively studied in graph-based applications. In contrast to algorithmic fairness on independent and identically distributed (i.i.d.) data, fairness in graph mining has exclusive backgrounds, taxonomies, and fulfilling techniques. In this survey, we provide a comprehensive and up-to-date introduction of existing literature under the context of fair graph mining. Specifically, we propose a novel taxonomy of fairness notions on graphs, which sheds light on their connections and differences. We further present an organized summary of existing techniques that promote fairness in graph mining. Finally, we summarize the widely used datasets in this emerging research field and provide insights on current research challenges and open questions, aiming at encouraging cross-breeding ideas and further advances.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )
    Perception Visualization: Seeing Through the Eyes of a DNN. (arXiv:2204.09920v1 [cs.CV])
    Artificial intelligence (AI) systems power the world we live in. Deep neural networks (DNNs) are able to solve tasks in an ever-expanding landscape of scenarios, but our eagerness to apply these powerful models leads us to focus on their performance and deprioritises our ability to understand them. Current research in the field of explainable AI tries to bridge this gap by developing various perturbation or gradient-based explanation techniques. For images, these techniques fail to fully capture and convey the semantic information needed to elucidate why the model makes the predictions it does. In this work, we develop a new form of explanation that is radically different in nature from current explanation methods, such as Grad-CAM. Perception visualization provides a visual representation of what the DNN perceives in the input image by depicting what visual patterns the latent representation corresponds to. Visualizations are obtained through a reconstruction model that inverts the encoded features, such that the parameters and predictions of the original models are not modified. Results of our user study demonstrate that humans can better understand and predict the system's decisions when perception visualizations are available, thus easing the debugging and deployment of deep models as trusted systems.  ( 2 min )
    GUARD: Graph Universal Adversarial Defense. (arXiv:2204.09803v1 [cs.LG])
    Recently, graph convolutional networks (GCNs) have shown to be vulnerable to small adversarial perturbations, which becomes a severe threat and largely limits their applications in security-critical scenarios. To mitigate such a threat, considerable research efforts have been devoted to increasing the robustness of GCNs against adversarial attacks. However, current approaches for defense are typically designed for the whole graph and consider the global performance, posing challenges in protecting important local nodes from stronger adversarial targeted attacks. In this work, we present a simple yet effective method, named \textbf{\underline{G}}raph \textbf{\underline{U}}niversal \textbf{\underline{A}}dve\textbf{\underline{R}}sarial \textbf{\underline{D}}efense (GUARD). Unlike previous works, GUARD protects each individual node from attacks with a universal defensive patch, which is generated once and can be applied to any node (node-agnostic) in a graph. Extensive experiments on four benchmark datasets demonstrate that our method significantly improves robustness for several established GCNs against multiple adversarial attacks and outperforms existing adversarial defense methods by large margins. Our code is publicly available at https://github.com/EdisonLeeeee/GUARD.  ( 2 min )
    MedFACT: Modeling Medical Feature Correlations in Patient Health Representation Learning via Feature Clustering. (arXiv:2204.10011v1 [cs.LG])
    In healthcare prediction tasks, it is essential to exploit the correlations between medical features and learn better patient health representations. Existing methods try to estimate feature correlations only from data, or increase the quality of estimation by introducing task-specific medical knowledge. However, such methods either are difficult to estimate the feature correlations due to insufficient training samples, or cannot be generalized to other tasks due to reliance on specific knowledge. There are medical research revealing that not all the medical features are strongly correlated. Thus, to address the issues, we expect to group up strongly correlated features and learn feature correlations in a group-wise manner to reduce the learning complexity without losing generality. In this paper, we propose a general patient health representation learning framework MedFACT. We estimate correlations via measuring similarity between temporal patterns of medical features with kernel methods, and cluster features with strong correlations into groups. The feature group is further formulated as a correlation graph, and we employ graph convolutional networks to conduct group-wise feature interactions for better representation learning. Experiments on two real-world datasets demonstrate the superiority of MedFACT. The discovered medical findings are also confirmed by literature, providing valuable medical insights and explanations.  ( 2 min )
    fairDMS: Rapid Model Training by Data and Model Reuse. (arXiv:2204.09805v1 [cs.LG])
    Extracting actionable information from data sources such as the Linac Coherent Light Source (LCLS-II) and Advanced Photon Source Upgrade (APS-U) is becoming more challenging due to the fast-growing data generation rate. The rapid analysis possible with ML methods can enable fast feedback loops that can be used to adjust experimental setups in real-time, for example when errors occur or interesting events are detected. However, to avoid degradation in ML performance over time due to changes in an instrument or sample, we need a way to update ML models rapidly while an experiment is running. We present here a data service and model service to accelerate deep neural network training with a focus on ML-based scientific applications. Our proposed data service achieves 100x speedup in terms of data labeling compare to the current state-of-the-art. Further, our model service achieves up to 200x improvement in training speed. Overall, fairDMS achieves up to 92x speedup in terms of end-to-end model updating time.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.  ( 2 min )
    Scaling Language Model Size in Cross-Device Federated Learning. (arXiv:2204.09715v1 [cs.CL])
    Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a $21$M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with $\sim10\times$ smaller client-to-server communication cost and $11\%$ lower perplexity than smaller LSTMs commonly studied in literature.  ( 2 min )
    A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines. (arXiv:2204.09772v1 [cs.AI])
    A misspecified reward can degrade sample efficiency and induce undesired behaviors in reinforcement learning (RL) problems. We propose symbolic reward machines for incorporating high-level task knowledge when specifying the reward signals. Symbolic reward machines augment existing reward machine formalism by allowing transitions to carry predicates and symbolic reward outputs. This formalism lends itself well to inverse reinforcement learning, whereby the key challenge is determining appropriate assignments to the symbolic values from a few expert demonstrations. We propose a hierarchical Bayesian approach for inferring the most likely assignments such that the concretized reward machine can discriminate expert demonstrated trajectories from other trajectories with high accuracy. Experimental results show that learned reward machines can significantly improve training efficiency for complex RL tasks and generalize well across different task environment configurations.  ( 2 min )
    Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach. (arXiv:2204.09746v1 [cs.LG])
    The limited communication resources, e.g., bandwidth and energy, and data heterogeneity across devices are two of the main bottlenecks for federated learning (FL). To tackle these challenges, we first devise a novel FL framework with partial model aggregation (PMA), which only aggregates the lower layers of neural networks responsible for feature extraction while the upper layers corresponding to complex pattern recognition remain at devices for personalization. The proposed PMA-FL is able to address the data heterogeneity and reduce the transmitted information in wireless channels. We then obtain a convergence bound of the framework under a non-convex loss function setting. With the aid of this bound, we define a new objective function, named the scheduled data sample volume, to transfer the original inexplicit optimization problem into a tractable one for device scheduling, bandwidth allocation, computation and communication time division. Our analysis reveals that the optimal time division is achieved when the communication and computation parts of PMA-FL have the same power. We also develop a bisection method to solve the optimal bandwidth allocation policy and use the set expansion algorithm to address the optimal device scheduling. Compared with the state-of-the-art benchmarks, the proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets, i.e., MINIST and CIFAR-10, respectively. In addition, the proposed joint dynamic device scheduling and resource optimization approach achieve slightly higher accuracy than the considered benchmarks, but they provide a satisfactory energy and time reduction: 29% energy or 20% time reduction on the MNIST; and 25% energy or 12.5% time reduction on the CIFAR-10.  ( 2 min )
    Exact Formulas for Finite-Time Estimation Errors of Decentralized Temporal Difference Learning with Linear Function Approximation. (arXiv:2204.09801v1 [cs.LG])
    In this paper, we consider the policy evaluation problem in multi-agent reinforcement learning (MARL) and derive exact closed-form formulas for the finite-time mean-squared estimation errors of decentralized temporal difference (TD) learning with linear function approximation. Our analysis hinges upon the fact that the decentralized TD learning method can be viewed as a Markov jump linear system (MJLS). Then standard MJLS theory can be applied to quantify the mean and covariance matrix of the estimation error of the decentralized TD method at every time step. Various implications of our exact formulas on the algorithm performance are also discussed. An interesting finding is that under a necessary and sufficient stability condition, the mean-squared TD estimation error will converge to an exact limit at a specific exponential rate.  ( 2 min )
    Matching Writers to Content Writing Tasks. (arXiv:2204.09718v1 [cs.CL])
    Businesses need content. In various forms and formats and for varied purposes. In fact, the content marketing industry is set to be worth $412.88 billion by the end of 2021. However, according to the Content Marketing Institute, creating engaging content is the #1 challenge that marketers face today. We under-stand that producing great content requires great writers who understand the business and can weave their message into reader (and search engine) friendly content. In this project, the team has attempted to bridge the gap between writers and projects by using AI and ML tools. We used NLP techniques to analyze thou-sands of publicly available business articles (corpora) to extract various defining factors for each writing sample. Through this project we aim to automate the highly time-consuming, and often biased task of manually shortlisting the most suitable writer for a given content writing requirement. We believe that a tool like this will have far reaching positive implications for both parties - businesses looking for suitable talent for niche writing jobs as well as experienced writers and Subject Matter Experts (SMEs) wanting to lend their services to content marketing projects. The business gets the content they need, the content writer/ SME gets a chance to leverage his or her talent, while the reader gets authentic content that adds real value.  ( 2 min )
    Generative Pre-Trained Transformers for Biologically Inspired Design. (arXiv:2204.09714v1 [cs.CL])
    Biological systems in nature have evolved for millions of years to adapt and survive the environment. Many features they developed can be inspirational and beneficial for solving technical problems in modern industries. This leads to a novel form of design-by-analogy called bio-inspired design (BID). Although BID as a design method has been proven beneficial, the gap between biology and engineering continuously hinders designers from effectively applying the method. Therefore, we explore the recent advance of artificial intelligence (AI) for a computational approach to bridge the gap. This paper proposes a generative design approach based on the pre-trained language model (PLM) to automatically retrieve and map biological analogy and generate BID in the form of natural language. The latest generative pre-trained transformer, namely GPT-3, is used as the base PLM. Three types of design concept generators are identified and fine-tuned from the PLM according to the looseness of the problem space representation. Machine evaluators are also fine-tuned to assess the correlation between the domains within the generated BID concepts. The approach is then tested via a case study in which the fine-tuned models are applied to generate and evaluate light-weighted flying car concepts inspired by nature. The results show our approach can generate BID concepts with good performance.  ( 2 min )
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.  ( 2 min )
    FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow. (arXiv:2204.09679v1 [cs.CV])
    Super-resolution suffers from an innate ill-posed problem that a single low-resolution (LR) image can be from multiple high-resolution (HR) images. Recent studies on the flow-based algorithm solve this ill-posedness by learning the super-resolution space and predicting diverse HR outputs. Unfortunately, the diversity of the super-resolution outputs is still unsatisfactory, and the outputs from the flow-based model usually suffer from undesired artifacts which causes low-quality outputs. In this paper, we propose FS-NCSR which produces diverse and high-quality super-resolution outputs using frequency separation and noise conditioning compared to the existing flow-based approaches. As the sharpness and high-quality detail of the image rely on its high-frequency information, FS-NCSR only estimates the high-frequency information of the high-resolution outputs without redundant low-frequency components. Through this, FS-NCSR significantly improves the diversity score without significant image quality degradation compared to the NCSR, the winner of the previous NTIRE 2021 challenge.  ( 2 min )
  • Open

    Deep Learning meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?. (arXiv:2204.09664v2 [cs.LG] UPDATED)
    We study the theory of neural network (NN) from the lens of classical nonparametric regression problems with a focus on NN's ability to adaptively estimate functions with heterogeneous smoothness -- a property of functions in Besov or Bounded Variation (BV) classes. Existing work on this problem requires tuning the NN architecture based on the function spaces and sample sizes. We consider a "Parallel NN" variant of deep ReLU networks and show that the standard weight decay is equivalent to promoting the $\ell_p$-sparsity ($0<p<1$) of the coefficient vector of an end-to-end learned function bases, i.e., a dictionary. Using this equivalence, we further establish that by tuning only the weight decay, such Parallel NN achieves an estimation error arbitrarily close to the minimax rates for both the Besov and BV classes. Notably, it gets exponentially closer to minimax optimal as the NN gets deeper. Our research sheds new lights on why depth matters and how NNs are more powerful than kernel methods.  ( 2 min )
    The Silent Problem -- Machine Learning Model Failure -- How to Diagnose and Fix Ailing Machine Learning Models. (arXiv:2204.10227v1 [cs.LG])
    The COVID-19 pandemic has dramatically changed how healthcare is delivered to patients, how patients interact with healthcare providers, and how healthcare information is disseminated to both healthcare providers and patients. Analytical models that were trained and tested pre-pandemic may no longer be performing up to expectations, providing unreliable and irrelevant learning (ML) models given that ML depends on the basic principle that what happened in the past are likely to repeat in the future. ML faced to two important degradation principles, concept drift, when the underlying properties and characteristics of the variables change and data drift, when the data distributions, probabilities, co-variates, and other variable relationships change, both of which are prime culprits of model failure. Therefore, detecting and diagnosing drift in existing models is something that has become an imperative. And perhaps even more important is a shift in our mindset towards a conscious recognition that drift is inevitable, and model building must incorporate intentional resilience, the ability to offset and recover quickly from failure, and proactive robustness, avoiding failure by developing models that are less vulnerable to drift and disruption.  ( 2 min )
    Strong posterior contraction rates via Wasserstein dynamics. (arXiv:2203.10754v2 [math.ST] UPDATED)
    In this paper, we develop a novel approach to posterior contractions rates (PCRs), for both finite-dimensional (parametric) and infinite-dimensional (nonparametric) Bayesian models. Critical to our approach is the combination of an assumption of local Lipschitz-continuity for the posterior distribution with a dynamic formulation of the Wasserstein distance, here referred to as Wasserstein dynamics, which allows to set forth a connection between the problem of establishing PCRs and some classical problems in mathematical analysis, probability theory and mathematical statistics: the Laplace method for approximating integrals, Sanov's large deviation principles in the Wasserstein distance, rates of convergence of the mean Glivenko-Cantelli theorem, and estimates of weighted Poincar\'e-Wirtinger constants. Under dominated Bayesian models, we present two main results: i) a theorem on PCRs for the regular infinite-dimensional exponential family of statistical models; ii) a theorem on PCRs for a general dominated statistical model. Some applications of our results are presented for the regular parametric model, the multinomial model, the finite-dimensional and the infinite-dimensional logistic-Gaussian model and the infinite-dimensional linear regression. In general, our results lead to optimal PCRs in finite dimension, whereas in infinite dimension it is shown how the prior distribution may affect PCRs. With regards to infinite-dimensional Bayesian models for density estimation, our approach to PCRs is the first to consider strong norm distances on parameter spaces of functions, such as Sobolev-like norms, as most of the approaches in the classical (frequentist) and Bayesian literature deal with spaces of density functions endowed with $\mathrm{L}^p$ norms or the Hellinger distance.  ( 2 min )
    Towards Resolving Propensity Contradiction in Offline Recommender Learning. (arXiv:1910.07295v6 [stat.ML] UPDATED)
    We study offline recommender learning from explicit rating feedback in the presence of selection bias. A current promising solution for the bias is the inverse propensity score (IPS) estimation. However, the performance of existing propensity-based methods can suffer significantly from the propensity estimation bias. In fact, most of the previous IPS-based methods require some amount of missing-completely-at-random (MCAR) data to accurately estimate the propensity. This leads to a critical self-contradiction; IPS is ineffective without MCAR data, even though it originally aims to learn recommenders from only missing-not-at-random feedback. To resolve this propensity contradiction, we derive a propensity-independent generalization error bound and propose a novel algorithm to minimize the theoretical bound via adversarial learning. Our theory and algorithm do not require a propensity estimation procedure, thereby leading to a well-performing rating predictor without the true propensity information. Extensive experiments demonstrate that the proposed approach is superior to a range of existing methods both in rating prediction and ranking metrics in practical settings without MCAR data.  ( 2 min )
    Wrapped Distributions on homogeneous Riemannian manifolds. (arXiv:2204.09790v1 [math.ST])
    We provide a general framework for constructing probability distributions on Riemannian manifolds, taking advantage of area-preserving maps and isometries. Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions that are straightforward to sample from, suitable for use within Monte Carlo algorithms and latent variable models, such as autoencoders. As an illustration, we empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model. Finally, we take advantage of the generalized description of this framework to posit questions for future work.  ( 2 min )
    Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. (arXiv:2204.09787v1 [cs.LG])
    Despite the success of reinforcement learning (RL) for Markov decision processes (MDPs) with function approximation, most RL algorithms easily fail if the agent only has partial observations of the state. Such a setting is often modeled as a partially observable Markov decision process (POMDP). Existing sample-efficient algorithms for POMDPs are restricted to the tabular setting where the state and observation spaces are finite. In this paper, we make the first attempt at tackling the tension between function approximation and partial observability. In specific, we focus on a class of undercomplete POMDPs with linear function approximations, which allows the state and observation spaces to be infinite. For such POMDPs, we show that the optimal policy and value function can be characterized by a sequence of finite-memory Bellman operators. We propose an RL algorithm that constructs optimistic estimators of these operators via reproducing kernel Hilbert space (RKHS) embedding. Moreover, we theoretically prove that the proposed algorithm finds an $\varepsilon$-optimal policy with $\tilde O (1/\varepsilon^2)$ episodes of exploration. Also, this sample complexity only depends on the intrinsic dimension of the POMDP polynomially and is independent of the size of the state and observation spaces. To our best knowledge, we develop the first provably sample-efficient algorithm for POMDPs with function approximation.
    Computationally Efficient and Statistically Optimal Robust Low-rank Matrix and Tensor Estimation. (arXiv:2203.00953v3 [math.ST] UPDATED)
    Low-rank matrix estimation under heavy-tailed noise is challenging, both computationally and statistically. Convex approaches have been proven statistically optimal but suffer from high computational costs, especially since robust loss functions are usually non-smooth. More recently, computationally fast non-convex approaches via sub-gradient descent are proposed, which, unfortunately, fail to deliver a statistically consistent estimator even under sub-Gaussian noise. In this paper, we introduce a novel Riemannian sub-gradient (RsGrad) algorithm which is not only computationally efficient with linear convergence but also is statistically optimal, be the noise Gaussian or heavy-tailed. Convergence theory is established for a general framework and specific applications to absolute loss, Huber loss, and quantile loss are investigated. Compared with existing non-convex methods, ours reveals a surprising phenomenon of dual-phase convergence. In phase one, RsGrad behaves as in a typical non-smooth optimization that requires gradually decaying stepsizes. However, phase one only delivers a statistically sub-optimal estimator which is already observed in the existing literature. Interestingly, during phase two, RsGrad converges linearly as if minimizing a smooth and strongly convex objective function and thus a constant stepsize suffices. Underlying the phase-two convergence is the smoothing effect of random noise to the non-smooth robust losses in an area close but not too close to the truth. Lastly, RsGrad is applicable for low-rank tensor estimation under heavy-tailed noise where a statistically optimal rate is attainable with the same phenomenon of dual-phase convergence, and a novel shrinkage-based second-order moment method is guaranteed to deliver a warm initialization. Numerical simulations confirm our theoretical discovery and showcase the superiority of RsGrad over prior methods.
    Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning. (arXiv:2106.09226v2 [cs.LG] UPDATED)
    Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.
    Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows. (arXiv:2111.10510v8 [stat.ML] UPDATED)
    In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control (i.e. Schr\"odinger bridges). We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
    From Stars to Subgraphs: Uplifting Any GNN with Local Structure Awareness. (arXiv:2110.03753v3 [cs.LG] UPDATED)
    Message Passing Neural Networks (MPNNs) are a common type of Graph Neural Network (GNN), in which each node's representation is computed recursively by aggregating representations (messages) from its immediate neighbors akin to a star-shaped pattern. MPNNs are appealing for being efficient and scalable, how-ever their expressiveness is upper-bounded by the 1st-order Weisfeiler-Lehman isomorphism test (1-WL). In response, prior works propose highly expressive models at the cost of scalability and sometimes generalization performance. Our work stands between these two regimes: we introduce a general framework to uplift any MPNN to be more expressive, with limited scalability overhead and greatly improved practical performance. We achieve this by extending local aggregation in MPNNs from star patterns to general subgraph patterns (e.g.,k-egonets):in our framework, each node representation is computed as the encoding of a surrounding induced subgraph rather than encoding of immediate neighbors only (i.e. a star). We choose the subgraph encoder to be a GNN (mainly MPNNs, considering scalability) to design a general framework that serves as a wrapper to up-lift any GNN. We call our proposed method GNN-AK(GNN As Kernel), as the framework resembles a convolutional neural network by replacing the kernel with GNNs. Theoretically, we show that our framework is strictly more powerful than 1&2-WL, and is not less powerful than 3-WL. We also design subgraph sampling strategies which greatly reduce memory footprint and improve speed while maintaining performance. Our method sets new state-of-the-art performance by large margins for several well-known graph ML tasks; specifically, 0.08 MAE on ZINC,74.79% and 86.887% accuracy on CIFAR10 and PATTERN respectively.
    Backplay: "Man muss immer umkehren". (arXiv:1807.06919v5 [cs.LG] UPDATED)
    Model-free reinforcement learning (RL) requires a large number of trials to learn a good policy, especially in environments with sparse rewards. We explore a method to improve the sample efficiency when we have access to demonstrations. Our approach, Backplay, uses a single demonstration to construct a curriculum for a given task. Rather than starting each training episode in the environment's fixed initial state, we start the agent near the end of the demonstration and move the starting point backwards during the course of training until we reach the initial state. Our contributions are that we analytically characterize the types of environments where Backplay can improve training speed, demonstrate the effectiveness of Backplay both in large grid worlds and a complex four player zero-sum game (Pommerman), and show that Backplay compares favorably to other competitive methods known to improve sample efficiency. This includes reward shaping, behavioral cloning, and reverse curriculum generation.
    Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data. (arXiv:2009.09139v3 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and negative task transfer, or learning interference. Often, in Natural Language Processing (NLP), a separate model per task is needed to obtain the best performance. However, many fine-tuning approaches are both parameter inefficient, i.e., potentially involving one new model per task, and highly susceptible to losing knowledge acquired during pretraining. We propose a novel Transformer architecture consisting of a new conditional attention mechanism as well as a set of task-conditioned modules that facilitate weight sharing. Through this construction (a hypernetwork adapter), we achieve more efficient parameter sharing and mitigate forgetting by keeping half of the weights of a pretrained model fixed. We also use a new multi-task data sampling strategy to mitigate the negative effects of data imbalance across tasks. Using this approach, we are able to surpass single task fine-tuning methods while being parameter and data efficient (using around 66% of the data for weight updates). Compared to other BERT Large methods on GLUE, our 8-task model surpasses other Adapter methods by 2.8% and our 24-task model outperforms by 0.7-1.0% models that use MTL and single task fine-tuning. We show that a larger variant of our single multi-task model approach performs competitively across 26 NLP tasks and yields state-of-the-art results on a number of test and development sets. Our code is publicly available at https://github.com/CAMTL/CA-MTL.
    Theoretical Analysis of Self-Training with Deep Networks on Unlabeled Data. (arXiv:2010.03622v5 [cs.LG] UPDATED)
    Self-training algorithms, which train a model to fit pseudolabels predicted by another previously-learned model, have been very successful for learning with unlabeled data using neural networks. However, the current theoretical understanding of self-training only applies to linear models. This work provides a unified theoretical analysis of self-training with deep networks for semi-supervised learning, unsupervised domain adaptation, and unsupervised learning. At the core of our analysis is a simple but realistic "expansion" assumption, which states that a low probability subset of the data must expand to a neighborhood with large probability relative to the subset. We also assume that neighborhoods of examples in different classes have minimal overlap. We prove that under these assumptions, the minimizers of population objectives based on self-training and input-consistency regularization will achieve high accuracy with respect to ground-truth labels. By using off-the-shelf generalization bounds, we immediately convert this result to sample complexity guarantees for neural nets that are polynomial in the margin and Lipschitzness. Our results help explain the empirical successes of recently proposed self-training algorithms which use input consistency regularization.
    Scale Dependencies and Self-Similarity Through Wavelet Scattering Covariance. (arXiv:2204.10177v1 [physics.data-an])
    We introduce a scattering covariance matrix which provides non-Gaussian models of time-series having stationary increments. A complex wavelet transform computes signal variations at each scale. Dependencies across scales are captured by the joint covariance across time and scales of complex wavelet coefficients and their modulus. This covariance is nearly diagonalized by a second wavelet transform, which defines the scattering covariance. We show that this set of moments characterizes a wide range of non-Gaussian properties of multi-scale processes. This is analyzed for a variety of processes, including fractional Brownian motions, Poisson, multifractal random walks and Hawkes processes. We prove that self-similar processes have a scattering covariance matrix which is scale invariant. This property can be estimated numerically and defines a class of wide-sense self-similar processes. We build maximum entropy models conditioned by scattering covariance coefficients, and generate new time-series with a microcanonical sampling algorithm. Applications are shown for highly non-Gaussian financial and turbulence time-series.
    Beyond the density operator and Tr(\rho A): Exploiting the higher-order statistics of random-coefficient pure states for quantum information processing. (arXiv:2204.10031v1 [quant-ph])
    Two types of states are widely used in quantum mechanics, namely (deterministic-coefficient) pure states and statistical mixtures. A density operator can be associated with each of them. We here address a third type of states, that we previously introduced in a more restricted framework. These states generalize pure ones by replacing each of their deterministic ket coefficients by a random variable. We therefore call them Random-Coefficient Pure States, or RCPS. We analyze their properties and their relationships with both types of usual states. We show that RCPS contain much richer information than the density operator and mean of observables that we associate with them. This occurs because the latter operator only exploits the second-order statistics of the random state coefficients, whereas their higher-order statistics contain additional information. That information can be accessed in practice with the multiple-preparation procedure that we propose for RCPS, by using second-order and higher-order statistics of associated random probabilities of measurement outcomes. Exploiting these higher-order statistics opens the way to a very general approach for performing advanced quantum information processing tasks. We illustrate the relevance of this approach with a generic example, dealing with the estimation of parameters of a quantum process and thus related to quantum process tomography. This parameter estimation is performed in the non-blind (i.e. supervised) or blind (i.e. unsupervised) mode. We show that this problem cannot be solved by using only the density operator \rho of an RCPS and the associated mean value Tr(\rho A) of the operator A that corresponds to the considered physical quantity. We succeed in solving this problem by exploiting a fourth-order statistical parameter of state coefficients, in addition to second-order statistics. Numerical tests validate this result.
    Infographics Wizard: Flexible Infographics Authoring and Design Exploration. (arXiv:2204.09904v1 [cs.HC])
    Infographics are an aesthetic visual representation of information following specific design principles of human perception. Designing infographics can be a tedious process for non-experts and time-consuming, even for professional designers. With the help of designers, we propose a semi-automated infographic framework for general structured and flow-based infographic design generation. For novice designers, our framework automatically creates and ranks infographic designs for a user-provided text with no requirement for design input. However, expert designers can still provide custom design inputs to customize the infographics. We will also contribute an individual visual group (VG) designs dataset (in SVG), along with a 1k complete infographic image dataset with segmented VGs in this work. Evaluation results confirm that by using our framework, designers from all expertise levels can generate generic infographic designs faster than existing methods while maintaining the same quality as hand-designed infographics templates.
    Inducing Gaussian Process Networks. (arXiv:2204.09889v1 [cs.LG])
    Gaussian processes (GPs) are powerful but computationally expensive machine learning models, requiring an estimate of the kernel covariance matrix for every prediction. In large and complex domains, such as graphs, sets, or images, the choice of suitable kernel can also be non-trivial to determine, providing an additional obstacle to the learning task. Over the last decade, these challenges have resulted in significant advances being made in terms of scalability and expressivity, exemplified by, e.g., the use of inducing points and neural network kernel approximations. In this paper, we propose inducing Gaussian process networks (IGN), a simple framework for simultaneously learning the feature space as well as the inducing points. The inducing points, in particular, are learned directly in the feature space, enabling a seamless representation of complex structured domains while also facilitating scalable gradient-based learning methods. We consider both regression and (binary) classification tasks and report on experimental results for real-world data sets showing that IGNs provide significant advances over state-of-the-art methods. We also demonstrate how IGNs can be used to effectively model complex domains using neural network architectures.
    Murmurations of elliptic curves. (arXiv:2204.10140v1 [math.NT])
    We investigate the average value of the $p$th Dirichlet coefficients of elliptic curves for a prime p in a fixed conductor range with given rank. Plotting this average yields a striking oscillating pattern, the details of which vary with the rank. Based on this observation, we perform various data-scientific experiments with the goal of classifying elliptic curves according to their ranks.
    A majorization-minimization algorithm for nonnegative binary matrix factorization. (arXiv:2204.09741v1 [cs.LG])
    This paper tackles the problem of decomposing binary data using matrix factorization. We consider the family of mean-parametrized Bernoulli models, a class of generative models that are well suited for modeling binary data and enables interpretability of the factors. We factorize the Bernoulli parameter and consider an additional Beta prior on one of the factors to further improve the model's expressive power. While similar models have been proposed in the literature, they only exploit the Beta prior as a proxy to ensure a valid Bernoulli parameter in a Bayesian setting; in practice it reduces to a uniform or uninformative prior. Besides, estimation in these models has focused on costly Bayesian inference. In this paper, we propose a simple yet very efficient majorization-minimization algorithm for maximum a posteriori estimation. Our approach leverages the Beta prior whose parameters can be tuned to improve performance in matrix completion tasks. Experiments conducted on three public binary datasets show that our approach offers an excellent trade-off between prediction performance, computational complexity, and interpretability.
    Scalable One-Pass Optimisation of High-Dimensional Weight-Update Hyperparameters by Implicit Differentiation. (arXiv:2110.10461v3 [cs.LG] UPDATED)
    Machine learning training methods depend plentifully and intricately on hyperparameters, motivating automated strategies for their optimisation. Many existing algorithms restart training for each new hyperparameter choice, at considerable computational cost. Some hypergradient-based one-pass methods exist, but these either cannot be applied to arbitrary optimiser hyperparameters (such as learning rates and momenta) or take several times longer to train than their base models. We extend these existing methods to develop an approximate hypergradient-based hyperparameter optimiser which is applicable to any continuous hyperparameter appearing in a differentiable model weight update, yet requires only one training episode, with no restarts. We also provide a motivating argument for convergence to the true hypergradient, and perform tractable gradient-based optimisation of independent learning rates for each model parameter. Our method performs competitively from varied random hyperparameter initialisations on several UCI datasets and Fashion-MNIST (using a one-layer MLP), Penn Treebank (using an LSTM) and CIFAR-10 (using a ResNet-18), in time only 2-3x greater than vanilla training.  ( 2 min )
    Intact-VAE: Estimating Treatment Effects under Unobserved Confounding. (arXiv:2101.06662v3 [stat.ML] UPDATED)
    NOTE: This preprint has a flawed theoretical formulation. Please avoid it and refer to the ICLR22 publication https://openreview.net/forum?id=q7n2RngwOM. Also, arXiv:2109.15062 contains some new ideas on unobserved Confounding. As an important problem of causal inference, we discuss the identification and estimation of treatment effects under unobserved confounding. Representing the confounder as a latent variable, we propose Intact-VAE, a new variant of variational autoencoder (VAE), motivated by the prognostic score that is sufficient for identifying treatment effects. We theoretically show that, under certain settings, treatment effects are identified by our model, and further, based on the identifiability of our model (i.e., determinacy of representation), our VAE is a consistent estimator with representation balanced for treatment groups. Experiments on (semi-)synthetic datasets show state-of-the-art performance under diverse settings.  ( 2 min )
    The NIST CTS Speaker Recognition Challenge. (arXiv:2204.10228v1 [eess.AS])
    The US National Institute of Standards and Technology (NIST) has been conducting a second iteration of the CTS challenge since August 2020. The current iteration of the CTS Challenge is a leaderboard-style speaker recognition evaluation using telephony data extracted from the unexposed portions of the Call My Net 2 (CMN2) and Multi-Language Speech (MLS) corpora collected by the LDC. The CTS Challenge is currently organized in a similar manner to the SRE19 CTS Challenge, offering only an open training condition using two evaluation subsets, namely Progress and Test. Unlike in the SRE19 Challenge, no training or development set was initially released, and NIST has publicly released the leaderboards on both subsets for the CTS Challenge. Which subset (i.e., Progress or Test) a trial belongs to is unknown to challenge participants, and each system submission needs to contain outputs for all of the trials. The CTS Challenge has also served, and will continue to do so, as a prerequisite for entrance to the regular SREs (such as SRE21). Since August 2020, a total of 53 organizations (forming 33 teams) from academia and industry have participated in the CTS Challenge and submitted more than 4400 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for some primary conditions in the CTS Challenge. The CTS Challenge results thus far indicate remarkable improvements in performance due to 1) speaker embeddings extracted using large-scale and complex neural network architectures such as ResNets along with angular margin losses for speaker embedding extraction, 2) extensive data augmentation, 3) the use of large amounts of in-house proprietary data from a large number of labeled speakers, 4) long-duration fine-tuning.  ( 2 min )
    Scalable Sensitivity and Uncertainty Analysis for Causal-Effect Estimates of Continuous-Valued Interventions. (arXiv:2204.10022v1 [cs.LG])
    Estimating the effects of continuous-valued interventions from observational data is critically important in fields such as climate science, healthcare, and economics. Recent work focuses on designing neural-network architectures and regularization functions to allow for scalable estimation of average and individual-level dose response curves from high-dimensional, large-sample data. Such methodologies assume ignorability (all confounding variables are observed) and positivity (all levels of treatment can be observed for every unit described by a given covariate value), which are especially challenged in the continuous treatment regime. Developing scalable sensitivity and uncertainty analyses that allow us to understand the ignorance induced in our estimates when these assumptions are relaxed receives less attention. Here, we develop a continuous treatment-effect marginal sensitivity model (CMSM) and derive bounds that agree with both the observed data and a researcher-defined level of hidden confounding. We introduce a scalable algorithm to derive the bounds and uncertainty-aware deep models to efficiently estimate these bounds for high-dimensional, large-sample observational data. We validate our methods using both synthetic and real-world experiments. For the latter, we work in concert with climate scientists interested in evaluating the climatological impacts of human emissions on cloud properties using satellite observations from the past 15 years: a finite-data problem known to be complicated by the presence of a multitude of unobserved confounders.  ( 2 min )
    Exploring Structural Sparsity of Deep Networks via Inverse Scale Spaces. (arXiv:1905.09449v5 [cs.LG] UPDATED)
    The great success of deep neural networks is built upon their over-parameterization, which smooths the optimization landscape without degrading the generalization ability. Despite the benefits of over-parameterization, a huge amount of parameters makes deep networks cumbersome in daily life applications. Though techniques such as pruning and distillation are developed, they are expensive in fully training a dense network as backward selection methods, and there is still a void on systematically exploring forward selection methods for learning structural sparsity in deep networks. To fill in this gap, this paper proposes a new approach based on differential inclusions of inverse scale spaces, which generate a family of models from simple to complex ones along the dynamics via coupling a pair of parameters, such that over-parameterized deep models and their structural sparsity can be explored simultaneously. This kind of differential inclusion scheme has a simple discretization, dubbed Deep structure splitting Linearized Bregman Iteration (DessiLBI), whose global convergence in learning deep networks could be established under the Kurdyka-Lojasiewicz framework. Experimental evidence shows that our method achieves comparable and even better performance than the competitive optimizers in exploring the sparse structure of several widely used backbones on the benchmark datasets. Remarkably, with early stopping, our method unveils `winning tickets' in early epochs: the effective sparse network structures with comparable test accuracy to fully trained over-parameterized models, that are further transferable to similar alternative tasks. Furthermore, our method is able to grow networks efficiently with adaptive filter configurations, demonstrating a good performance with much less computational cost. Codes and models can be downloaded at {https://github.com/DessiLBI2020/DessiLBI}.  ( 3 min )
    Out-of-distribution generalization for learning quantum dynamics. (arXiv:2204.10268v1 [quant-ph])
    Generalization bounds are a critical tool to assess the training data requirements of Quantum Machine Learning (QML). Recent work has established guarantees for in-distribution generalization of quantum neural networks (QNNs), where training and testing data are assumed to be drawn from the same data distribution. However, there are currently no results on out-of-distribution generalization in QML, where we require a trained model to perform well even on data drawn from a distribution different from the training distribution. In this work, we prove out-of-distribution generalization for the task of learning an unknown unitary using a QNN and for a broad class of training and testing distributions. In particular, we show that one can learn the action of a unitary on entangled states using only product state training data. We numerically illustrate this by showing that the evolution of a Heisenberg spin chain can be learned using only product training states. Since product states can be prepared using only single-qubit gates, this advances the prospects of learning quantum dynamics using near term quantum computers and quantum experiments, and further opens up new methods for both the classical and quantum compilation of quantum circuits.  ( 2 min )
    Path-Specific Objectives for Safer Agent Incentives. (arXiv:2204.10018v1 [cs.AI])
    We present a general framework for training safe agents whose naive incentives are unsafe. As an example, manipulative or deceptive behaviour can improve rewards but should be avoided. Most approaches fail here: agents maximize expected return by any means necessary. We formally describe settings with 'delicate' parts of the state which should not be used as a means to an end. We then train agents to maximize the causal effect of actions on the expected return which is not mediated by the delicate parts of state, using Causal Influence Diagram analysis. The resulting agents have no incentive to control the delicate state. We further show how our framework unifies and generalizes existing proposals.  ( 2 min )
    Ultra Marginal Feature Importance. (arXiv:2204.09938v1 [stat.ML])
    Scientists frequently prioritize learning from data rather than training the best possible model; however, research in machine learning often prioritizes the latter. The development of marginal feature importance methods, such as marginal contribution feature importance, attempts to break this trend by providing a useful framework for explaining relationships in data in an interpretable fashion. In this work, we generalize the framework of marginal contribution feature importance to improve performance with regards to detecting correlated interactions and reducing runtime. To do so, we consider "information subsets" of the set of features $F$ and show that our importance metric can be computed directly after applying fair representation learning methods from the AI fairness literature. The methods of optimal transport and linear regression are considered and explored experimentally for removing all the information of our feature of interest $f$ from the feature set $F$. Given these implementations, we show on real and simulated data that ultra marginal feature importance performs at least as well as marginal contribution feature importance, with substantially faster computation time and better performance in the presence of correlated interactions and unrelated features.  ( 2 min )

  • Open

    I have a time series of time, temp, humidity, apparent temp, and ac/heater/fan state
    I want to create a NN that takes these readings and makes predictions about "what will the readings be in 5 minutes if I turn the AC on?". I'm thinking of training it with "the angle of the sun at that time", "temp", "humidity", and "ac/heater/fan state"(3) and then extracting data pairs spaced by 5 minutes where the system was in that state for the entire interval. Then I'm thinking I should use the 5-minute-later apparent temp as the training output. So the NN ultimately answers the question "what would be the apparent temperature if the system were to be in the given state for the next 5 minutes?" Am I on the right track here? submitted by /u/HasFiveVowels [link] [comments]  ( 1 min )
  • Open

    An AI painting some colorful pitbulls
    submitted by /u/p0goniphaft111 [link] [comments]
    Building A Pictionary App (sketch recognition model) with Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Is there a AI which I can use to edit images(selfies etc.)?
    Like I mark some areas of my pictures and then selcect what should happend with them? submitted by /u/xXLisa28Xx [link] [comments]
    What AI can I use to make caricatures from pictures from people?
    Is artbreeder the best way to do it, or is there a better way? submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 1 min )
    Learning or working with AI? Come join us, we are a Discord Community with over 20'000 members! Ask questions, find teammates, share your projects, attend events, and much more to come!
    Programming is way more fun when you learn/work with someone. Help each other, ask questions, brainstorm, etc. There is just so much benefit to joining a community when you are in this field, especially when you cannot find the question you are looking for on stack overflow! 😉 This is the same thing with AI, and it is why a little less than two years ago I created a discord server. Where anyone learning or working in the field could come and share their projects, learn together, work together, and much more. The community has now over 20 000 members, which is unbelievable! So glad to see it growing and see everyone so active. We also have an amazing partnership with an AI company coming that is super exciting for the community. You definitely want to be there to enjoy all the benefits they will give us. Come join us if you are in the field of AI ! https://discord.gg/learnaitogether submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    Analyse sentiment/tonality in social networks
    submitted by /u/akolonin [link] [comments]  ( 1 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 3 min )
  • Open

    [R] GAM(e) changer or not? An evaluation of interpretable machine learning models based on additive model constraints
    https://arxiv.org/abs/2204.09123 https://www.researchgate.net/publication/360079336_GAMe_changer_or_not_An_evaluation_of_interpretable_machine_learning_models_based_on_additive_model_constraints submitted by /u/Positive_Ad_1090 [link] [comments]
    How can you differentiate Kornia SIFT descriptor? [P]
    Kornia is a differentiable library for computer vision based on PyTorch. Does anyone have experience with their SIFT descriptor. What can you differentiate? submitted by /u/avd4292 [link] [comments]
    [D] Evaluation and Selecting Models: Base on Loss or Metrics?
    When comes to evaluating and selecting a model, should one focus on minimizing loss (i.e., sparse categorical crossentropy) or obtain high rated metrics (i.e., f1)? Often time the model highest rated metrics would generate higher loss than ones with lower ratings in metrics during validation/test sets. Some would say focus on metrics as loss are for the machine to optimize learning, what stays in the training, stays in the training. However, wouldn't loss be also an important element to consider since it also describe the performance of the model, particularly when obtained from the test set? How should one prioritize? Metrics/loss rules all or seek for balance? submitted by /u/Hydraze [link] [comments]  ( 1 min )
    [R] Optimize clustering for downstream task
    Assume to have a 2-step algorithm: 1) aggregate data points into clusters 2) feed the clusters to a downstream task (e.g. classification, regression, etc). Is there any work that explores how to optimize the clustering in 1) to achieve the best performance in the downstream task 2)? One example would be a differentiable clustering algorithm that receives gradients from the downstream task or a parametrized clustering algorithm whose parameters are automatically tuned to increase the performance of the downstream task. I have found very little on this topic in the literature, could you point me to some relevant work? submitted by /u/fedetask [link] [comments]  ( 1 min )
    [D] What is a good emoji aware pre-trained language model?
    I am classifying social media posts (facebook, instagram), with emojis being upwards of 100% of content. For example, you may want to tag "🤮🤮🤮" as in need for moderation, and "🤔🤔🤔" as prioritized for a response. Looking for a good model to fine tune I found BerTweet, which seems at least somewhat emoji aware. However it also has a ton of out-of-vocabulary results, both for emoji and semi-common English words, despite it's liberal use of emoji.demojize and splitting up more complex emoji: ​ https://preview.redd.it/t6ai3o8le1v81.png?width=687&format=png&auto=webp&s=c16157addbe1b3d34858708f3e6c7517e64d26ec A model like `xlm-roberta-base with a larger vocabulary (250k) and more robust tokenization seems to have some 500 emoji directly in its vocabulary directly, without converting them to text. This seems potentially more promising, but also guarantees a token like 🤮 is just out of vocabulary rather than being interpreted by word pieces. Has anyone here had experience with dealing with emoji in text classification, and what approaches were most successful? submitted by /u/sanderbaduk [link] [comments]  ( 2 min )
    [D] What is the best method to use metric network at finetune after contrastive learning?
    Hi, I have a question about how to use metric network after contrastive learning. If I have trained a network well with NCELoss, I would like to finetune this network to match the best output by input(It used at calculating NCELoss). Is there any good way to do it? ​ Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] Opinions needed - Anyone interested in mock peer review?
    We’d like to know if anyone is interested in participating in a mock peer review? Basically if you have a paper you’d like to get feedback on, and would like to review others’ papers in exchange, you’re welcome to continue reading. We are gauging public interest in mock peer review and exploring the possibility to host the reviews on DouBlind. We’d like to know your answers to the following questions: Are you interested in mock peer review? Do you want to do this privately (paper and review are kept inside a small group) or openly (paper and review are open)? How many papers do you like to review? Do you have any concerns? submitted by /u/DouBlindDotCOM [link] [comments]  ( 2 min )
    [Research] Explaining the Black Box Optimization Competition Winner Algorithm-HEBO Algorithm of AI Top Conference NeurIPS 2020
    This is reproduced from Zhihu and translated by DeepL, only used for enthusiasts to communicate. ​ MindSpore, as an end-to-edge cloud collaborative full-scenario AI open source framework, takes into account the flexibility of academic research and the high-performance needs of industry, supports end-to-edge cloud full-scene business, and brings developers a simpler programming, easier debugging, superior performance, and more flexible deployment experience, which has received widespread attention and application in the industry and has been open source on 2020.3.28, and is the Gitee The highest index of open source software. Welcome to participate in open source contributions, model crowdsourcing collaboration, industry innovation and application, algorithm innovation, academic collabora…  ( 3 min )
  • Open

    Data Pre-Processing in TF-Agents
    Hi everyone, this is my first post please go easy on me. I'm currently playing around with a bigger Model in tf-agents. I worked only with structured data (TF, SKlearn, Pandas...). Now I'm struggling a bit with the preprocessing and where in the architecture to place it. I use multiple Inputs and encoding layers for each of them. For the training of the Encoders I used some SKLearn pre-processor (StandardScaler, MinMaxScaler, KBinsDiscretizer). I try to reuse the pre-processing pipeline in the model or extract the information for other pre-processing mechanisms(e.g. pre-processing tf layers) My current options I came up with: Incorporate it directly into the Environment and return the pre-processed observation Pro easy, can probably use my SKlearn pipeline Contra I'd like to keep the architecture clean, so the environment should only give out raw values and not prepared values for a certain model Use an environment wrapper around the "raw" environment Pro "raw" environment needs no tuning Contra not sure if I can use my pipeline here and not sure if I'm taking a bad path here Use pre-processing TF layers Pro most of the API is there and can be used in my Encodernetworks, seems to be the TFic way Contra the SKlearn pre-processor have values for each column. The layers (e.g. rescale ) seems to only take one configuration for a whole tensor. I could probably create a layer for each value in the Tensor but that doesn't feel that it is supposed to be like that. If you have more options or can share your experience one of the options mentioned above I would be very glad. submitted by /u/Kjiessar [link] [comments]  ( 1 min )
    Masking in RNN in the actor network
    I am using PPO in the context of multi-agent RL. I was wondering if PyTorch has a way of handling when hidden states should be reinitialized to zeros. What I have found is this implementation: def forward(self, x, hxs, masks): if x.size(0) == hxs.size(0): x, hxs = self.rnn(x.unsqueeze(0), (hxs * masks.repeat(1, self._recurrent_N).unsqueeze(-1)).transpose(0, 1).contiguous()) x = x.squeeze(0) hxs = hxs.transpose(0, 1) else: # x is a (T, N, -1) tensor that has been flatten to (T * N, -1) N = hxs.size(0) T = int(x.size(0) / N) # unflatten x = x.view(T, N, x.size(1)) # Same deal with masks masks = masks.view(T, N) # Let's figure out which steps in the sequence have a zero for any agent # We will always assume t=0 has a zero in it as that makes the logic cleaner has_zeros = ((masks[1:] == 0.0) .any(dim=-1) .nonzero() .squeeze() .cpu()) # +1 to correct the masks[1:] if has_zeros.dim() == 0: # Deal with scalar has_zeros = [has_zeros.item() + 1] else: has_zeros = (has_zeros + 1).numpy().tolist() # add t=0 and t=T to the list has_zeros = [0] + has_zeros + [T] hxs = hxs.transpose(0, 1) outputs = [] for i in range(len(has_zeros) - 1): # We can now process steps that don't have any zeros in masks together! # This is much faster start_idx = has_zeros[i] end_idx = has_zeros[i + 1] temp = (hxs * masks[start_idx].view(1, -1, 1).repeat(self._recurrent_N, 1, 1)).contiguous() rnn_scores, hxs = self.rnn(x[start_idx:end_idx], temp) outputs.append(rnn_scores) # assert len(outputs) == T # x is a (T, N, -1) tensor x = torch.cat(outputs, dim=0) # flatten x = x.reshape(T * N, -1) hxs = hxs.transpose(0, 1) submitted by /u/No_Possibility_7588 [link] [comments]  ( 2 min )
    Does anyone know of a chess environment written in JAX?
    I don't think an opensource one exists but figured I'd ask here because you never know what's laying around the internet! As an aside, if one doesn't exist, let me know if you're interested in partnering in writing one! Edit: For anyone wondering I need the env to be in jax because my muzero implementation is in jax and I need the env to run on TPU cores, not CPU submitted by /u/evanatyourservice [link] [comments]  ( 1 min )
    Papers that use neural networks solely for planning in large MDPS (i.e., no learning)
    I am looking for any papers that do the following: use neural networks in the RL pipeline as the state space is too large for calculating the optimal policy using the traditional tabular value iteration or policy iteration. In this setting, the model is completely known, i.e., no learning. Most papers I see with DeepRL assume that the transition probabilities are unknown and that they have access to a simulator that gives them the ability to query data points. I am looking for existing work in DeepRL where the transition probabilities are known but the problem is intractable using tabular methods. Any direction would be appreciated, thanks! submitted by /u/lolillini [link] [comments]  ( 1 min )
    PPO update without using NNs / batch updates
    Hello, im making a new post as i couldnt find any answers to this before (although this reddit post is similar to my issue) I am trying to implement a simple multivariate Gaussian policy without neural networks, basically using a standard policy gradient update with SGD + score function gradient, without batches. The reason for this is to avoid unstable updates, meaning too large updates in mean/variance. The idea is thus to use a trust region update, to keep the updates within some reasonable size. I am a little confused regarding the maximization of the surrogate objective. As seen in this stackoverflow post, we wish to maximize [pi/pi_old] , compared to [log(pi)] in vanilla PG. Since i do not use automatic differentiation, but one single stochastic descent, how do I find the gradient of pi/pi_old ? To my understanding, the flow of the algorithm is this: sample experience -> compute new policy parameters -> compare with previous policy -> construct surrogate function -> perform SGD on surrogate to get the actual new policy It is the last step i am struggling with. submitted by /u/Acrobatic-Ad-9189 [link] [comments]  ( 1 min )
    Simulating robotic arm for object manipulation
    I'll be starting my work for object manipulation using deep RL, and i would like to get start from the scratch, please recommend the source, tools, and software used for this purpose. but not be working on modeling the robot, instead will be using any robot with gripper which can be interfaced with ROS. Also please link the github repositories which can be helpfull in the learning process Thanks submitted by /u/Western-Age3148 [link] [comments]  ( 1 min )
    policy-encoding mapping implementation
    Hi, I want to check policy-encoding mapping e : (S → A) → R^k in Universal Successor Features Approximators. I don't know how to embedding network to another network. There are too many weights! Do you have any ideas? Thank you for reading! submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Useful Tools and Resources for Reinforcement Learning
    Found a useful list of Tools, Frameworks, and Resources for RL/ML. It covers Reinforcement learning, Machine Learning (TensorFlow & PyTorch), Core ML, Deep Learning, Computer Vision (CV). I thought I'd share it for anyone that's interested submitted by /u/Khaotic_Kernel [link] [comments]  ( 1 min )
  • Open

    Pix2Seq: A New Language Interface for Object Detection
    Posted by Ting Chen and David Fleet, Research Scientists, Google Research, Brain Team Object detection is a long-standing computer vision task that attempts to recognize and localize all objects of interest in an image. The complexity arises when trying to identify or localize all object instances while also avoiding duplication. Existing approaches, like Faster R-CNN and DETR, are carefully designed and highly customized in the choice of architecture and loss function. This specialization of existing systems has created two major barriers: (1) it adds complexity in tuning and training the different parts of the system (e.g., region proposal network, graph matching with GIOU loss, etc.), and (2), it can reduce the ability of a model to generalize, necessitating a redesign of the model for…  ( 7 min )
  • Open

    Secure AWS CodeArtifact access for isolated Amazon SageMaker notebook instances
    AWS CodeArtifact allows developers to connect internal code repositories to upstream code repositories like Pypi, Maven, or NPM. AWS CodeArtifact is a powerful addition to CI/CD workflows on AWS, but it is similarly effective for code-bases hosted on a Jupyter notebook. This is a common development paradigm for Machine Learning developers that build and train […]  ( 9 min )
  • Open

    7 Ways Your Business Can Plan For Artificial Intelligence
    Artificial Intelligence is all over the world today. From the use of virtual assistants like Siri, Alexa, or Cortana, to improving…  ( 2 min )
  • Open

    By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet
    Different parts of the globe are experiencing distinct climate challenges — severe drought, dangerous flooding, reduced biodiversity or dense air pollution. The challenges are so great that no country can solve them on their own. But innovative startups worldwide are lighting the way, demonstrating how these daunting challenges can be better understood and addressed with Read article > The post By Land, Sea and Space: How 5 Startups Are Using AI to Help Save the Planet appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Last Week in AI: Chip Startup Funding Doubled, Google Text+Image Search, Analog AI, Criminal Robotaxi
    submitted by /u/regalalgorithm [link] [comments]
    AI Dream 31 - Spaceships Galore Planet VQGAN CLIP
    submitted by /u/LordPewPew777 [link] [comments]
    What would be the best approach to auto-generate comic panels (Garfield style) with drawings and speech bubbles, assuming I have tons of scans to use as training?
    I'm a software developer but I'm not really experienced in AI. Would it be best to train first for speech bubbles and separately for panel drawings? What kind of network is the best for this? Just thinking that it would be a cool project to have auto generated legible infinite comic strips for a semi niche comic strip that runs in my country. submitted by /u/dananite [link] [comments]  ( 1 min )
    Any Recommendations for AI Content Generation Software?
    Content generation is such a time-suck for small businesses, and it seems like an interesting vertical to apply AI. The AI would generate the content after being given a prompt. There are already a few tools trying this, but the quality doesn't seem to be very high. Are there better tools that I'm missing, or is the consumer-facing software so early-stage that it would be better to hire a data scientist and train an AI system specifically for this purpose? https://www.reddit.com/r/MachinesWrite/comments/f45eav/list_of_ai_text_generators/?utm_source=share&utm_medium=web2x&context=3 https://www.reddit.com/r/juststart/comments/axa8w3/ai_ml_text_generators/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/CliffWoolum [link] [comments]  ( 1 min )
    Looking for enterprise conversational AI platform
    submitted by /u/sunstormfirefall [link] [comments]  ( 1 min )
    VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    Microsoft AI Researchers Develop ‘Ekya’ To Address The Problem Of Data Drift On The Edge Compute Box And Enables Both Retraining And Inference To Co-Exist On It
    Deep neural network (DNN) models for object recognition and classification, such as Yolo, ResNet, and EfficientNet, are used in video analytics applications such as urban mobility and smart automobiles. There is a symbiotic link between edge computing and video analytics, claiming that live video analytics is the “killer app” for edge computing. Edge devices come in various sizes and designs, but they are always resource-constrained compared to the cloud. Video analytics deployments send the videos to on-premises edge servers. The article handles the difficulty of supporting inference and retraining jobs on edge servers simultaneously, which necessitates navigating the fundamental tradeoff between the accuracy of the retrained model and the accuracy of the inference. Edge computation is preferred for video analytics because it eliminates the need for expensive network lines to broadcast videos to the cloud while simultaneously preserving video privacy. Edge computation has a finite amount of resources (e.g., with weak GPUs). The mismatch between the increasing rate of model compute needs, and the total cycles of processors exacerbate this problem. As a result, model compression is used in edge deployments. Continue reading our bite on this research Paper: https://www.microsoft.com/en-us/research/uploads/prod/2021/07/nsdi22spring-final74.pdf Github: https://github.com/edge-video-services/ekya submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Is there a AI which is able to turn normal videos into sketches like the video below?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Automatic Summaries of your Documents in Google Docs !
    submitted by /u/OnlyProggingForFun [link] [comments]
    How to achieve a training duration on MindSpore that's less than or equal to that on TensorFlow?
    submitted by /u/Creative_Habit_6868 [link] [comments]  ( 1 min )
    CypherZilla - The First Encoded NFT Made By AI To Support Trump. Upvote If You Want To Have A Huge Impact!
    CypherZilla on OpenSea https://reddit.com/link/u8he59/video/i9a29i5ywtu81/player submitted by /u/thecypherbeast [link] [comments]
    What price we have to pay for the progress in AI, have a look-
    https://www.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Collaboration AI video and music
    submitted by /u/Recent_Coffee_2551 [link] [comments]
  • Open

    Question about trained models
    Hello I have a question. For example, in the case of an inverted pendulum or cartpole, I train the model for the pole to be at 0 degrees (vertical) and it works. Then I want this same model to keep the pole at another position, for example, 3 degrees, do I have to train this model again for achieving this to or can I somehow use the model I already trained and what it learnt and input the new position I want it to be? idk if I explained myself I guess its mostly doubts about how to interact with the model and how to properly use a model that has already been trained. If anyone has some example of code (python, gym), on interacting with a trained model it would be really helpful. submitted by /u/Sleyck [link] [comments]  ( 1 min )
    Why is this implementation of PPO using a replay buffer?
    https://github.com/marlbenchmark/on-policy/blob/main/onpolicy/algorithms/r_mappo/r_mappo.py submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    What is the role of masks in the computation of GAE?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Question About Optimal Policy Guarantees in POMDPs
    I'm working on a project where I'm trying to prove the existence of a particular set of functions by showing it can be constructed as the solution to a Markov Decision Process. However, it seems that it's much simpler to convert it to a partially observable MDP, rather than a classic one. I know it's been proven that the set of optimal policies for a classic MDP is nonempty, and intuitively I feel like the same should hold for POMDPs, but I'm having a hard time finding a particular source proving such a thing. Does anyone know where I ought to look? submitted by /u/LessPoliticalAccount [link] [comments]  ( 1 min )
    Can reinforcement learning learn itself? A reply to 'Reward is enough' (PDF)
    submitted by /u/JBaloney [link] [comments]  ( 1 min )
    What is this line in the Sutton/Barto textbook referring to?
    In the first edition of the textbook, the section on actor-critic methods (link) describes the classical approach of using the temporal difference error 𝛿 to modify the probability of selecting action a in state s: https://preview.redd.it/fa35vut7ewu81.png?width=238&format=png&auto=webp&s=c1b8952b065a90ecd2b8c0c30b985e36d37dbc30 Then they briefly mention that one variation on the classical approach is to scale temporal difference error 𝛿 by the inverse of the probability of selecting the action a, where that probability is given by 𝜋(s, a): https://preview.redd.it/69gd3zmbdwu81.png?width=375&format=png&auto=webp&s=b66a7d5eef3c2b7bc256473aed728223921a751c They say: " These issues were explored early on, primarily for the immediate reward case (Sutton, 1984; Williams, 1992) and have not been brought fully up to date." This idea is relevant to a project I'm working on, and I'd like to read more about it. But the references seem to be dead ends: Sutton 1984 is his PhD thesis, which I can't find a digital copy of, and Williams 1992 is this paper, which doesn't seem to contain this idea. Also this section doesn't seem to appear in the second edition of the textbook. You folks are much smarter than I am: Does modifying the update in this way mean anything to you? Are there modern approaches that do something like this? Or should I assume it was a little-explored idea in the early days that has been more-or-less forgotten? Thanks very much! submitted by /u/Careless-Argument-37 [link] [comments]  ( 2 min )
    Reinforcement Learning with delays
    I was wondering what methods there are for RL with time delay other than augmenting the state space with the action buffer or using a model to undelay the environment. I've seen this post How to deal with the time delay in reinforcement learning? - Artificial Intelligence Stack Exchange however it's rather brief and I wondered if there were any more recent advancements. I am also struggling to understand partial trajectory resampling ( 2010.02966.pdf (arxiv.org) ) and the code in the accompanying repo. GitHub - rmst/rlrd: PyTorch implementation of our paper Reinforcement Learning with Random Delays (ICLR 2020) I was wondering how we can resample actions in environments with constant delays if those actions are used in the state space for all subsequent chosen actions? submitted by /u/SuperDuperDooken [link] [comments]  ( 1 min )
    Is it stupid to use rl to control solar panel angle?
    submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    How can I use the environment in Emergence of Locomotion Behaviours in Rich Environments?
    Hi, I want to train my agent in the environment used in "Emergence of Locomotion Behaviours in Rich Environments". Here is a video about that https://www.youtube.com/watch?v=hx_bgoTF7bs. Is the environment released? Thanks for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    [P] mGPT model released: a multilingual gpt-3-like model for 61 language
    Hi everyone. Today we released the mGPT model: multilingual generative pre-trained transformer The checkpoints are available on Huggingface model page The example usage is at the Github repo https://github.com/ai-forever/mgpt The model has 1.3 billion parameters The context length is 512 tokens. The model can generate sequences after the input prompt, can be used for fine-tuning or for zero- and few-shot learning: from transformers import GPT2LMHeadModel, GPT2Tokenizer model_name = "sberbank-ai/mGPT" tokenizer = GPT2Tokenizer.from_pretrained(model_name) model = GPT2LMHeadModel.from_pretrained(model_name) model.cuda() model.eval() texts = [ "My favourite holiday is ", "Իմ սիրելի տոնն է ", "Моє улюблене свято ", "mi fiesta favorita es ", "मेरी पसंदीदा छुट्टी है", "我最喜欢的节日是", "Min…  ( 2 min )
    [P] Deep Learning GPU Benchmark: A Latency-Based Approach
    Hi r/MachineLearning! I want to share with you a fun side project of mine on benchmarking the GPUs for deep learning: [project page]. https://preview.redd.it/7olwqyze5yu81.png?width=2041&format=png&auto=webp&s=25aecb9733366720a2be5cecc2048eb2a734c9b9 Here are some key features: It helps to estimate the runtime of algorithms on a different GPU. It measures GPU processing speed independent of GPU memory capacity. It contains adjustable weightings through interactive UIs. The project page also explains how this benchmark differs from existing ones, and why this benchmark is more relevant to academic research. I would love to know what you think! submitted by /u/roll-a-dice [link] [comments]  ( 1 min )
    [D] What's your perfect laptop for deep learning research?
    I'm using mbp 2015, it's a pretty solid laptop, I like it a lot, though it feels slow and I've started to look for a replacement. Given that I run all experiment on gpu dedicated servers, my laptop serves me as a typewriter, it's ok, but I'd like to get more out of it. Frankly I'm a bit disappointed by 2021 Macbooks, hope they'll be improved in 2022. Recently lambda labs together with razer announced their tensorbook https://lambdalabs.com/deep-learning/laptops/tensorbook , their pricing looks weird to me, the more you pay the more years of support you have, that's the only thing which differentiates base bundle from enterprise. Also there is no option to customize hardware for it, though basic bundle itself looks ok, its price is $3500 like M1 Max's. What's your opinion about this laptop in particular? would you buy it? generally this laptop looks like a cool thing to have for local model development even from a tent somewhere in Nepal, given that you have enough power banks to charge it. :) What's your choice of a laptop for DL? My biggest requirement is a durable laptop which will serve at least 5 years, better with NVIDA GPU for development and debugging. submitted by /u/taras-sereda [link] [comments]  ( 1 min )
    [R] Deep models of superficial face judgments (PNAS)
    ​ Transformations that alter the perception of target faces Paper: https://www.pnas.org/doi/10.1073/pnas.2115228119 Dataset: https://onemillionimpressions.com/ submitted by /u/joshuacpeterson [link] [comments]
    [R] Planting Undetectable Backdoors in Machine Learning Models
    submitted by /u/Wiskkey [link] [comments]
    [P] VICReg: Tutorial and Lightweight PyTorch Implementation blog post
    Here's a tutorial and lightweight PyTorch implementation of VICReg: Variance-Invariance-Covariance Regularization for Self-Supervised Learning. Hope you find it helpful! submitted by /u/thejashGI [link] [comments]
    [P] Announcing cleanlab 2.0: Automatically Find Errors in ML Datasets
    Hi folks. This morning I released the new cleanlab 2.0 Python package for automatically finding errors in datasets and machine learning/analytics with real-world, messy data and labels. tl;dr - cleanlab provides a framework to streamline data-centric AI. https://preview.redd.it/hq1kyasvwwu81.png?width=2279&format=png&auto=webp&s=4fa3c82ec66d685c8fc4f95c5d9a0fc4be192d6b After 1.0 launch last year, engineers used cleanlab at Google to clean and train robust models on speech data), at Amazon to estimate how often the Alexa device doesn’t wake, at Wells Fargo to train reliable financial prediction models, and at Microsoft, Tesla, Facebook, etc. Joined by two good friends from grad school, we completely rebuilt cleanlab 2.0 to work for all data scientists, ML datasets, and models; and hit a…  ( 2 min )
    [P] Galp Hackathon - Win 10.000€ from home!
    If you are passionate about Data & AI we have the perfect challenge for you! The applications for Galp’s Hackathon Retail 4.0 are OPEN! With this Hackathon, Galp is challenging the community to propose solutions to specific problems and use cases that they think could improve their typical customer journey in the service stations. Gather a team and come up with an innovative solution for a chance of winning 10.000€! Let’s shape the future of Galp's retail? Apply now: https://taikai.network/en/galp/hackathons/retail40 https://preview.redd.it/wkfb6ybuwwu81.png?width=3334&format=png&auto=webp&s=deef13767df5ba607e387ce4e278ae3981d93582 submitted by /u/migueldsalmeida [link] [comments]  ( 1 min )
    [D] Imbalanced multi class classification 📌
    I'm working on a Machine Learning problem for multi class classification with imbalanced classes distribution, so obviously my model favours classes with more data and fails to predict classes with few data, what are the techniques I can use to help the model distinguish all the classes the same way ? P.S I'm avoiding to use SMOTE method to train the model on real used data rather than generated submitted by /u/According-Promise-23 [link] [comments]  ( 2 min )
    [R] CVPR 2022 - Photorealistic Monocular 3D Reconstruction of Humans Wearing Clothing
    submitted by /u/SleekEagle [link] [comments]
    [R] My continuously updated machine learning research notes
    Dear ML researchers, For the past many years, I've been updating my machine learning research notes for my PhD students and everyone online continuously. I don't like uploading to arxiv to get "citations", and GitHub serves me well: Hope they are useful for you: https://github.com/roboticcam/machine-learning-notes Richard, submitted by /u/MLknowledge [link] [comments]  ( 1 min )
    [D] Correcting for imbalance in regression datasets
    Hi, I am performing a Image --> scalar regression. The output scalar I am trying to estimate follows a roughly Gaussian distribution. I notice that the DNN output is biased to output values towards the mean (makes sense). ​ This seems like a problem of imbalanced data. For classification, I can oversample minority classes. What is the equivalent for regression? Is there an equivalent technique for regression where we oversample "outliers" and undersample central values. submitted by /u/rsandler [link] [comments]  ( 1 min )
    Building Dense Passage Retrievers [P]
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] How do you usually run sanity checks when training GANs ?
    Hi, I have been studying super-resolution with gans and took a look at SRGAN et ESRGAN. I have spent the whole day running experiments in order to find if I can manage to overfit on a single batch of 16 / 32 / 128 examples (MNIST). I have found out that it's almost impossible to use this tactic as a sanity check because it simply cannot generate good quality samples. I would like to know what are your thoughts on this, and how you would run sanity checks regarding GANs. ​ Thank you ! submitted by /u/Frizzoux [link] [comments]  ( 1 min )
    [D] Amazon Releases a New Multilingual Dataset for NLU
    https://www.amazon.science/blog/amazon-releases-51-language-dataset-for-language-understanding submitted by /u/__lawless [link] [comments]
    [D] How to handle features that apply to a whole csv-file vs single rows?
    Hi all, I have csv-files (~300) with a fixed set of columns (~40) but varying number of rows (sum of all rows ~300 000) and multiple labels per csv that I want to predict. Because of the limited number of csv-files and as a first try I am predicting the labels row-wise (attaching the label to all the rows of one csv-file) which works well for some labels but not for others. Currently, I am calculating some features for every row and just appending them to the row and some features for the whole csv-file and appending them to every row. Two problems are now arising that I would like to hear some input about: The number of features per csv is growing and it seems like a waste to copy them to every row. For some labels it is probably reasonable to throw away most of the rows and only feed in a handful. How would you design a structure that incorporates the limited number of csv-files and the different ways to treat features (row vs. csv)? submitted by /u/tlklk [link] [comments]  ( 2 min )
    [R][P] Differences in publishing a paper at a conference and in a journal?
    Hi! I am an undergrad and I am going to start my MS in CS this fall. My research interest is mainly in Multimodal Learning for language and Speech. I have written papers before but both my papers have been peer reviewed journal papers (Knowledge-Based Systems, Elsevier) [1] [2] I now want to start publishing papers in conferences since I have noticed that it is much easier to get noticed and recieve reviews when the paper is presented at a conference. I want to understand how different is the publication process for conferences? I also wanted recommendations on conferences in the NLP and Speech area, considering this will be my first conference paper. Thanks! (I would also appreciate reviews on my papers if anyone has the time to look them over. Thanks!) submitted by /u/prabhav55221 [link] [comments]  ( 4 min )
    [D] How do you get the maximum of arxiv sanity?
    Basically, I don't want to phrase this as a a "how-to" post but arxiv-sanity-lite really bothers me. How do you guys find recent papers in your area of interest which are promising besides following what is published at major conferences? I believe the website is "too lightweight". For example, what if I am interested in computer vision papers and I specify that in the tags field (i.e. explicitly typing "computer vision"). How can I list the papers based on a score (basically goodness of the paper)? Why does using shortcuts (basically links) like `````recommend over last week or recommend over last 3 days always (at least for me) end up with 0 results? I've never used the original arxiv-sanity before so I strongly believe that there is something that I am missing. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 1 min )
    [N] New opportunity: PhD Candidate within multisensor data fusion and applied machine learning for analysis of Arctic sea ice
    The Norwegian University of Science and Technology (NTNU) has a vacancy for PhD Candidate within the DIGITALSEAICE project . The project aims to build a multi-scale digital infrastructure that integrates local and regional sea ice models for improved forecasting and understanding of variations in polar ice conditions. More information here: https://www.jobbnorge.no/en/available-jobs/job/224802/ submitted by /u/KatjaKim [link] [comments]  ( 1 min )
    [P] Efficient Deep Learning Book
    We are working on a book that focuses on deep learning efficiency techniques such as quantization, pruning, distillation, etc. for both server-side as well as on-device (smartphones, IoT, etc.) applications. The goal is to introduce these ideas in a single place, without having to parse many papers, try to get a working code sample, and then spend time debugging. With the accompanying codelabs, we hope that our readers can make their models 4-20x smaller, faster, and better in quality. We have released the first four chapter's draft PDFs, and would truly appreciate any sort of comments / feedback. Book: efficientdlbook.com Feedback: hello@efficientdlbook.com submitted by /u/EfficientDLBook [link] [comments]  ( 1 min )
    [D] Interview w/ Google Brain researchers on Sparse Expert Models (Switch Transformers, GLAM, and more...)
    https://youtu.be/ccBMRryxGog This video is an interview with Barret Zoph and William Fedus of Google Brain about Sparse Expert Models. Sparse Expert models have been hugely successful at distributing parts of models, mostly Transformers, across large array of machines and use a routing function to effectively route signals between them. This means that even though these models have a huge number of parameters, the computational load for a given signal does not increase because the model is only sparsely activated. Sparse expert models, such as Switch Transformers and GLAM can scale up to trillions of parameters and bring a number of desirable properties. We discuss everything from the fundamentals, history, strengths and weaknesses, up to the current state of the art of these models. ​ OUTLINE: 0:00 - Intro 0:30 - What are sparse expert models? 4:25 - Start of Interview 5:55 - What do you mean by sparse experts? 8:10 - How does routing work in these models? 12:10 - What is the history of sparse experts? 14:45 - What does an individual expert learn? 19:25 - When are these models appropriate? 22:30 - How comparable are sparse to dense models? 26:30 - How does the pathways system connect to this? 28:45 - What improvements did GLAM make? 31:30 - The "designing sparse experts" paper 37:45 - Can experts be frozen during training? 41:20 - Can the routing function be improved? 47:15 - Can experts be distributed beyond data centers? 50:20 - Are there sparse experts for other domains than NLP? 52:15 - Are sparse and dense models in competition? 53:35 - Where do we go from here? 56:30 - How can people get started with this? ​ Papers: Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity (https://arxiv.org/abs/2101.03961) GLaM: Efficient Scaling of Language Models with Mixture-of-Experts (https://arxiv.org/abs/2112.06905) Designing Effective Sparse Expert Models (https://arxiv.org/abs/2202.08906) submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [P] Interactive semantic map of ICLR 2022
    Next week ICLR 2022 is taking place. Fully virtual and 1000+ high quality papers. To make sense of this volume of papers we have indexed the papers and provide an interactive semantic map of #ICLR2022, check out: https://search.zeta-alpha.com/?q=&d=ly&doc_sources=ICLR&sort_by=authority To enjoy the full map, click on [Explore more] and then enter full screen mode. We will also discuss the program and 10 must read papers in the Zeta Alpha "Trends in AI" ICLR edition webinar Monday 25th, for which you can sign up here. https://us06web.zoom.us/webinar/register/7816505274568/WN_82DzwhXZQbOCSTWgaI9xMw Looking forward to meet you online at ICLR 2022! https://preview.redd.it/6wdqj4ru7uu81.jpg?width=2202&format=pjpg&auto=webp&s=c97417c9ea39919041949bf3aa38ad33bb6eca5a submitted by /u/EngineerZetaAlpha [link] [comments]  ( 1 min )
    [R] MindSpore Paper Interpretation: MIEHDR CNN: Main Image Enhancement based Ghost-Free High Dynamic Range Imaging using Dual-Lens Systems
    This article is reproduced from Zhihu and translated by DeepL for enthusiasts to communicate. 1. Research Background High dynamic range images (HDR) are mainly oriented to picture display technology. In a certain scene, if the range of high and low luminance areas exceeds the maximum luminance range of the image, the display effect will be greatly reduced, and HDR is to better solve this problem, it can record a broader range of luminance images, so as to obtain a more effective display effect. The current solution to the problem of generating high dynamic range images (HDR) focuses on the fusion of two low dynamic range (LDR) images of different exposures taken with the same camera. In such a solution by the camera shake or object movement during the exposure time to produce the proble…  ( 3 min )
    [D] Most efficient way to use large image datasets with clusters for ML?
    I am having trouble finding some general information on this subject. I know I am down the rabbit hole when google doesn't have an answer. I want to know best practices and information on using clusters for machine learning with large amounts of data. I believe I have a close to an optimal solution but wanted to get some other opinions on the subject. My current setup: AWS EKS Kubernetes for a cluster Kubeflow for ML platform Katib for HPT jobs Pytorch for custom models Spot instance GPUs Lustre for file serving to the models My Data: Millions of Images stored in S3 ~50TB of data What is the most efficient way to move my data to the cluster? My current approach: Preprocess the data with a dedicated instance and store it in S3 Master runs on a dedicated node Katib spins up a set number of GPU spot nodes A claim is made, and an FSx Lustre system is generated for the pod Advantages: Very fast training and data movement with spot training Disadvantages: I have to spin up several Lustre systems for the training Preprocess the data with a dedicated instance and store it in S3 Possible alternative Same as above but use EFS as a distributed file system so I don't have to wait for Lustre Advantages: Potentially cheaper as I have only one FS Disadvantages: Slow throughput, read this was a bad idea Master runs on a dedicated node Other alternatives UseKatib spins up a PyTorch streaming function with S3(boosted transfer speed)set number of GPU spot nodes Every pod starts a claim is made and downloads data to an EBS Give up and switch to SageMakerFSx Lustre system is generated for the pod Anyone with experience in these technologies I would really appreciate hearing your thoughts. submitted by /u/thewineiswater [link] [comments]  ( 1 min )
    [D] [P] Neural network: same prediction for different inputs
    I am getting the same prediction for different inputs. I am trying to use a regressional neural network. Since data is huge, I am training one example at a time. Here is a simplified version of my code. model = Sequential() model.add(Dense(10000, input_dim=212207, kernel_initializer='normal', activation='relu')) model.add(Dense(100, activation='relu')) model.add(Dense(1, kernel_initializer='normal')) model.compile(loss='mean_squared_error', optimizer='adam') for i in range(10000000): #X is input with 212207 values #Y is a output value if i<6000000: model.fit(X.transpose(), Y, epochs=30, batch_size=1, verbose=0) else: prediction=model.predict(X.transpose()) I made sure that I am training on different examples and trying predictions on different examples. I am still getting the same prediction value for all testing inputs. I think I made some mistake in defining the model for regression neural network. Can you please check if the code is correct? submitted by /u/exoplanet_hunter [link] [comments]  ( 1 min )
  • Open

    Fixed points of bilinear transformations
    Introduction I was puzzled the first time I saw bilinear transformations, also known as Möbius transformations. I was in a class where everything had been abstract and general, and suddenly thing got very concrete and specific. I wondered why we had changed gears, and I wondered how there could be much to say about something […] Fixed points of bilinear transformations first appeared on John D. Cook.  ( 2 min )
    Partitioning complexity
    This post looks at how to partition complexity between definitions and theorems, and why it’s useful to be able to partition things more than one way. Quadratic equations Imagine the following dialog in an algebra class. “Quadratic equations always have two roots.” “But what about (x – 5)² = 0. That just has one root, […] Partitioning complexity first appeared on John D. Cook.  ( 4 min )
  • Open

    Hidden Interfaces for Ambient Computing
    Posted by Alex Olwal, Research Scientist, Google Augmented Reality and Artem Dementyev, Hardware Engineer, Google Research As consumer electronics and internet-connected appliances are becoming more common, homes are beginning to embrace various types of connected devices that offer functionality like music control, voice assistance, and home automation. A graceful integration of devices requires adaptation to existing aesthetics and user styles rather than simply adding screens, which can easily disrupt a visual space, especially when they become monolithic surfaces or black screens when powered down or not actively used. Thus there is an increasing desire to create connected ambient computing devices and appliances that can preserve the aesthetics of everyday materials, while providing …  ( 7 min )
  • Open

    Specify and extract information from documents using the new Queries feature in Amazon Textract
    Amazon Textract is a machine learning (ML) service that automatically extracts text, handwriting, and data from any document or image. Amazon Textract now offers the flexibility to specify the data you need to extract from documents using the new Queries feature within the Analyze Document API. You don’t need to know the structure of the […]  ( 11 min )
  • Open

    Understanding the Difference between Loss Functions and Metrics in Machine Learning/Deep Learning
    Yes! You read the heading right. There’s indeed a difference between loss functions and Metrics in the field of Machine Learning. However…  ( 2 min )
  • Open

    A new state of the art for unsupervised vision
    MIT CSAIL scientists created an algorithm to solve one of the hardest tasks in computer vision: assigning a label to every pixel in the world, without human supervision.  ( 7 min )
    Anticipating others’ behavior on the road
    A new machine-learning system may someday help driverless cars predict the next moves of nearby drivers, cyclists, and pedestrians in real-time.  ( 7 min )
  • Open

    Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors
    Your next trip to the dentist might offer a taste of AI. Pearl, a West Hollywood startup, provides AI for dental images to assist in diagnosis. It landed FDA clearance last month, the first to get such a go-ahead for dentistry AI. The approval paves the way for its use in clinics across the United Read article > The post Tooth Tech: AI Takes Bite Out of Dental Slide Misses by Assisting Doctors appeared first on NVIDIA Blog.  ( 4 min )
    GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW
    The gods must be smiling this GFN Thursday — God of War today joins the GeForce NOW library. Sony Interactive Entertainment and Santa Monica Studios’ masterpiece is available to stream from GeForce NOW servers, across nearly all devices and at up to 1440p and 120 frames per second for RTX 3080 members. Get ready to Read article > The post GFN Thursday Is Fit for the Gods: ‘God of War’ Arrives on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Building Dense Passage Retrievers
    Hi, I made a video explaining the ideas behind building a Dense Passage Retriever(DPR). Whenever we talk about retrievers, we mostly refer to the DPR formulation which appeared in this paper. A lot of publicly available implementations also use this formulation. In a previous video, we discussed how to use the DPR End-to-End QA system which uses DPR with a QA model. In this video, we solely focus on retrievers and the ideas behind building them. The implementation is quite similar to retrievers pre-trained with Inverse Close Task. This video is part 8 of 9 video series on Open-domain question answering using Dense retrievers. Thanks for the support and I will appreciate any feedback. https://www.youtube.com/watch?v=w61p0HLo7gc submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    NN from Scratch: #4 Backward Propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    Searching for volunteers for ML-based Ukrainian volunteer project.
    We are searching for trustworthy volunteers with some free time who would like to contribute to a digital Ukrainian volunteer project. Our system heavily relies on an image recognition system with a number of specialized filters involvg facial recognition, object recognition, logo detection, photoshop detection etc. People with professional experience with any of these things is preferred, but novice ML people are welcome to join us in a different capacity. DM to learn more about the project, glad to discuss the details with you. submitted by /u/eelgirl [link] [comments]  ( 1 min )
  • Open

    18 Differences Between Good and Great Data Scientists
    If you are employed as a data scientist and have survived (or strived!) in your position for more than a year, chances are you are at least a good data scientist. This is particularly true if you were promoted. The difference between a mediocre and a good data scientist will be the topic of a… Read More »18 Differences Between Good and Great Data Scientists The post 18 Differences Between Good and Great Data Scientists appeared first on Data Science Central.  ( 6 min )

  • Open

    Is there any difference between how DDPG and PPO use the replay buffer?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Any tips for a prospective graduate student in Reinforcement Learning?
    Hello Everyone, I apologize ahead of time if posts like this aren't looked well upon on this sub, but I couldn't find rules against this and I also think this is the best, most niche sub for my question. I also made a new account just to be safe haha. ​ Anyways, I will be graduating this spring with a BS in Computer Science and a BA in Mathematics. I have been researching Machine Learning since my sophomore year (adversarial machine learning) under a professor at my university and recently took upon a second, concurrent research position in RL since last summer. ​ My goal is to get into a PhD program at a higher level than my current university (my current university is good, but doesn't really have much of an AI focus as I've already taken all the AI grad courses as an undergrad). I'm…  ( 3 min )
    Task Allocation problem with graph representation
    Hey everyone, I've recently started working on a task allocation problem using RL. I'd just like to make sure my thinking is correct on how to best approach the problem. At the moment, we have (effectively) a graph traversal sim for n number of agents, where the goal is to minimize the total distance over an episode, as determined by setting the correct tasks. The task supplied to each agent will determine the route that is taken, and therefore the distance. The current idea is to supply an input graph that also contains information on the current location of the agents. A second input would be the set of available tasks. The expected output would be done through a pointer network, where we produce a reordered set of the tasks in descending order of optimality. When step is called, the sim runs until a new task is needed (agent completes it's route). ​ In general, does anyone know a good way to represent the inputs and output of this problem? A pointer network seems like it could work to produce actions, but if I need to do a forward pass for every agent, it seems that there would be no consideration of other agents when determining tasks (We shouldn't have 2 agents doing the same task). For the graph representation, a graph nn seems like an obvious choice, but I just wanted to see if anyone had any insight on why they may or may not be used. submitted by /u/asdfsflhasdfa [link] [comments]  ( 1 min )
    Universities working on reinforcement learning for robotics.
    Can you name any good universities (with high acceptance rate) which are working on reinforcement learning for robotics and also accept students from other branches (i.e. Electrical, Mechanical Engineering). submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Is the game of chess a finite MDP?
    In the standard intro to RL book, I have read that any MDP that has finite actions and states is a finite MDP. But that limit is subjective. So there are approximately 1045. If I limit myself to 105 states, can I say that chess isn't a finite MDP? submitted by /u/BraveProfessional656 [link] [comments]  ( 1 min )
    Reinforcement learning over traditional machine learning method in Finance/Banking ?
    I am currently studying use cases of RL in finance/banking/insurance and I am keen to understand what are its advantages and disadvantages than traditional methods. submitted by /u/kachua26 [link] [comments]  ( 1 min )
  • Open

    FormNet: Beyond Sequential Modeling for Form-Based Document Understanding
    Posted by Chen-Yu Lee and Chun-Liang Li, Research Scientists, Google Research, Cloud AI Team Form-based document understanding is a growing research topic because of its practical potential for automatically converting unstructured text data into structured information to gain insight about a document’s contents. Recent sequence modeling, which is a self-attention mechanism that directly models relationships between all words in a selection of text, has demonstrated state-of-the-art performance on natural language tasks. A natural approach to handle form document understanding tasks is to first serialize the form documents (usually in a left-to-right, top-to-bottom fashion) and then apply state-of-the-art sequence models to them. However, form documents often have more complex layouts …  ( 8 min )
  • Open

    [D] Who are using physics informed neural networks (PINN) in the industry?
    I stumbled upon this JD from Hitachi Energy, which mentions PINN in the section of preferred background: https://www.linkedin.com/jobs/view/2923292435/ Is PINN gaining more attention? And are there more players? submitted by /u/Kohomologia [link] [comments]  ( 1 min )
    [D] Is quantum AI a real thing? (from the software perspective)
    Hi all I'm keeping an eye on state of the art in quantum hardware, but what about software? I can think of many questions and maybe some of you are in the field. What should be the impact of quantum on ML/DL, realistically? What might be a roadmap for the software? And would quantum simulators do already have some benefits on AI? What are the best projects out there? I've seen many but haven't been very convinced submitted by /u/IntelligentHat1657 [link] [comments]  ( 2 min )
    [D] A more fair AI freelancer marketplace that cares freelancers' career advance and benefits
    Hi, ML freelancers. I'm starting a freelancing marketplace, tailored only for AI talents, and I especially care about the welfares of freelancers, and plan to add these: (1) you will be more treated as the employees of the platform, thus we provide training(for all), potentially health care plan(for people have stably worked >20 hours a week), and career advance plan, mentors from experienced freelancers where you get to learn (2) open discussion between employers and you so that you can scope the project better, set a reasonable rate, and timeline (3) we potentially provide MLOPs tool to improve your productivity. (4) we avoid global competition by matching business only with local region-freelancer or areas that are more expensive. How attractive do you think this will be? And any of these benefits already been provided by upwork, freelancer, toptal, fierr? submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [D] Diffusion models video tutorial
    Diffusion models have been behind a recent string of impressive generative results, including OpenAI's DALL-E 2. They’re powered by a simple yet expressive core mechanism. New video covering how they work: https://youtu.be/fbLgFrlTnGU submitted by /u/ariseff [link] [comments]
    [D] Building the Model Behind DoorDash’s Expansive Merchant Selection
    Interested in how DoorDash maintains a well performing and diverse selection in the numerous markets they operate in despite entering the delivery market relatively late ? I had the opportunity to collaborate in this project which involved building a number of models that measured customer preferences, identified market cuisine categories, and predicted merchants' performance on the platform. I wanted to share the approach and some of the technical details with the ML community to get feedback on what we can improve and to show this cool use case to others working on similar sales enablement based models. Check out the blog post I wrote and let me know what you think of our approach. Building the Model Behind DoorDash’s Expansive Merchant Selection submitted by /u/EfficientString7431 [link] [comments]  ( 1 min )
    [D] What's hot in deep learning research at the moment ?
    I took a break from deep learning( starting from last October) , now i want to get back, start with a new project and read papers . Where should i focus ? Should i keep working on vision transformers or maybe start something on geometric deep learning . What's hot and what's going on ? submitted by /u/ovotheking [link] [comments]  ( 1 min )
    [P] A simple PyTorch YOLOv1 training pipeline GitHub Repo
    https://github.com/sovit-123/yolov1_pytorch_voc07 ​ Also, I write about Deep Learning and Machine Learning on https://debuggercafe.com/ Please check it out and let me know if somebody wants any blog posts on a specific topic. submitted by /u/sovit-123 [link] [comments]
    [P] Programmatic: Powerful Weak Labeling
    Hi all!, Really excited to share a project we've been working on and get your feedback! We've made: Programmatic — an NLP annotation tool for building large labeled datasets for NLP without manual annotation Programmatic is like a REPL for data annotation. You: 1. Write simple rules/functions that can approximately label the data 2. Get near-instant feedback across your entire corpus 3. Iterate and improve your rules Finally, it uses a Bayesian label model [1] to convert these noisy annotations into a single, large, clean dataset, which you can then use for training machine learning models. You can programmatically label millions of datapoints in the time taken to hand-label hundreds. What we do differently from weak supervision packages like Snorkel/skweak[1] is to focus on UI to …  ( 2 min )
    [D] What's your opinion on project promoting posts in this sub? Your vote matters.
    There are many projects promoting in this sub, you may like or dislike. And if any of my posts you dislike, allow me to apologize first. However, it gets me to think. Several years ago I'm a moderator in a quite large forum, because I don't have enough time to fulfill my responsibilities, then I decided to retire (yes, they can, and I remained as the vip user which only retired moderators can be). This is a large community, a machine learning community. Besides continuously removing some of these posts, and no clear rules on it, can we do any better? We got all the data, and we just cannot train the model? Here are my three proposal, and please give some excellent ideas besides my poor ones: Self promoting post should have values other than itself, and not having annoying contents Self promoting project can be used as a tool in a non self promoting posts, as long as the posts creates valuable contents and the promoting is not obvious and annoying. Depends on the number of new project posts, Weekly/Daily project post can be created by moderator and pinned to the top. All the promoting content goes into the comment. We can explore and upvotes. Here are some illustrations: 1. Direct Promoting Post ​ 2. Indirect Promoting Post ​ Weekly/Daily Promoting Post by Moderator, Pinned to Top, Comments by project owner, upvotes/downvotes by us Which do you think is acceptable? Or you have better ideas? Leave a comment. It's a machine learning sub, don't make machine to solve it better than us. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 3 min )
    [R] Differentiable signal processing for optical communication with Google JAX
    Hey folks, I wrote a mini project based on JAX for optical communications signal processing. https://github.com/remifan/commplax I have a research article as a use case demo, https://remifan.github.io/gdbp_study/article.html This tool essentially implements adaptive DSP equalizers as stateful NN layers (thanks to Jax's explicit stateful syntax) implements compositor interfaces from scratch to wrap up those stateful layers with other regular NN layers so that they can be trained together Homebrew serial compositions of stateful layers It is a fun project for me and I feel JAX really elegantly fits this research use. What do you think about JAX? I appreciate your comments:) submitted by /u/StreetPrice1909 [link] [comments]  ( 1 min )
    [D] Tracking the hardware usage while running CV NN Model on a 1000 Images
    Hi guys, I've been working on a machine learning project and I wanted to see how hardware resources are being used when I run inference on let's say 1000 images. How could i calculate the CPU(running inference on CPU)/RAM workload in that timeframe? I'm running it on a Linux Ubuntu VM. Thanks in advance! submitted by /u/Fifi0912 [link] [comments]  ( 1 min )
    [D] Running interactive Python notebooks on HuggingFace Spaces
    I'm working on a framework Mercury for converting Python notebooks into interactive web apps. It can add widgets to the notebook based on the YAML configuration. End-user can tweak widgets values and execute the notebook. The resulting notebook can be downloaded as single-file HTML. Simple. The framework is built on Django+React. It is easy to deploy to Heroku or other cloud services. Recently, I made it possible to deploy it to Hugging Face Spaces (faster and larger machines than on free tier Heroku). The process of deployment is simple. You need to create a Gradio app on Spaces (my framework is not supported, yet ;) ). You need to add the app.py file that will run the Mercury server and upload the notebook. You can check the details in the docs. The HF Space with example notebook https://huggingface.co/spaces/pplonski/deploy-mercury submitted by /u/pp314159 [link] [comments]  ( 1 min )
    [D] Conditional GAN with multiple adversarial losses - Implementation?
    I would like to test the architecture from the following paper with a different dataset: https://www.mdpi.com/2072-4292/13/19/3834 The authors state that their objective function is the following: https://preview.redd.it/u78f27jb6nu81.png?width=1027&format=png&auto=webp&s=32790a67ec829a1e79b252edd0714b8b3b5a7f4e Where: -x is the real grayscale image. -s is its downsampled version, which should be used both as the initial imput of the generator performing the super-resolution and as a first conditional variable in the learning process. -e is another two-dimensional array containing values for a second additional conditional variable. The authors, however, state that this should be implemented by using two separate conditional adversarial losses, one for each of the conditional variables. To clarify, the first adversarial loss should be: AdvLoss1(ParametersG, ParametersD) = - Log(Discriminator(x,s) - Log(1-Discriminator(Generator(s),s) While the second would be: AdvLoss2(ParametersG, ParametersD) = - Log(Discriminator(x,e) - Log(1-Discriminator(Generator(s),e) Which should be then summed up for the backward pass. In my pytorch implementation, however, I have only been able to set up a unique adversarial loss, which could be defined as: CurrentAdvLoss(ParametersG, ParametersD) = - Log(Discriminator(x,(s,e)) - Log(1-Discriminator(Generator(s),(s,e)) I have tried to implement implemented as follows:(simplified version) which I calculate in the following training loop (simplified version, from the same question asked in the Pytorch forum) as errD and errG after conditioning the network on both s and e at the same time: https://discuss.pytorch.org/t/conditional-gan-with-multiple-adversarial-losses/149627 My question is, is there a way to modify the following loop to obtain outputs that have been separately conditioned only first on s and then on e and thus calculate the two separate adversarial losses originally proposed by the authors instead? submitted by /u/Franken91 [link] [comments]  ( 2 min )
    [D] IJCAI 2022 Paper Notification
    This is the discussion for accepted/rejected papers in IJCAI 2022. Results are supposed to release today. submitted by /u/errohan400 [link] [comments]  ( 1 min )
    [R] Authors Claim to Have "Solved" MNIST and CIFAR
    Paper: https://arxiv.org/abs/2204.07953v1 Code: https://github.com/decurtoydiaz/learning_with_signatures Tangential resources of interest: https://arxiv.org/abs/1905.08494, https://en.wikipedia.org/wiki/Rough_path#Signature, and https://labelerrors.com/ Personally, I believe from their code on Github, they have a possible data leakage (in the same vein of the current issue raised there) as well as an accuracy of 100% on a test set is fishier than a fish market. However, I am very curious to hear from the court of public opinion. How is everyone feeling about this? submitted by /u/blingblingbeepbeep [link] [comments]  ( 4 min )
    [D] How do I evaluate if my data represent the target variable before training a machine learning algorithm?
    I have a dataset of points cloud where each point in the point cloud has a variable. I am trying to relate the local geometry features to that point variable by using FPFH, This means I am generating my own features from the dataset by first using an area of n-points to compute normal-vector estimations and from x normal vector estimations to compute the FPFH. However, the numbers x and n are arbitrary and other combinations might describe the target variable better. So I wanted to know if there was a method to evaluate how good a given x and n value are at describing the target variable. I considered the correlation between the features (n,x) and the target variable but I read that this assumes linear combination redundancy. I am using scikit-learn. So basically I have features X(x,n) and a target variable Y. Which x and n, in the feature space X(x,n), describes the target variable, Y, best. I want to do it before the training because when I try to train it with my random forest regressor it takes 3-4 hours and I want to test for more combinations. submitted by /u/Neo-Rushdian [link] [comments]  ( 1 min )
    [Discussion] Training performance evaluation of MindSpore, a home-grown deep learning framework -- by ADSL Lab, CSU
    The article is reproduced from Zhihu, using deepl machine translation, for all enthusiasts to communicate Abstract Deep learning frameworks are the engines and motors for pushing the boundaries of artificial intelligence applications, and good deep learning frameworks can dramatically shorten the cycle of algorithm innovation and validation. In this report, we focus on the newly launched MindSpore framework, which has received a lot of industry attention, and systematically explore its model training speed on GPU clusters and compare it with popular international frameworks. In the evaluation experiments, we choose two classical models, ResNet and BERT-base, to test and analyze their performance with the same algorithm, the same dataset, and the same or similar performance hardware platf…  ( 7 min )
    [D] Why is the diffution model so powerful? but the math behind it is so simple.
    You can see the 200 lines code here: https://nn.labml.ai/diffusion/ddpm/index.html and https://github.com/cloneofsimo/minDiffusion, math is here: https://lilianweng.github.io/posts/2021-07-11-diffusion-models/ The algo is smart and simple, but it's generation result seems more incredible than GANs, and its speed is fast, the model size is not too big: https://openai.com/dall-e-2/ , https://huggingface.co/spaces/multimodalart/latentdiffusion, https://www.reddit.com/r/dalle2 So 1st question: why is diffusion model so powerful? Can someone explain it? 2st question: Has anyone used diffusion for NLP? ​ UPDATED: ​ \"A multiverse portal to a new world opening up above Tokyo\" by dalle2 (from r/dalle2) \"A robot painting on a canvas while playing the piano\" by dalle2 (from r/dalle2) ​ \"Mona Lisa in her studio painting Leonardo da Vinci \" by dalle2 (from r/dalle2) \"Science fiction illustration future city in the night | impressionism\" by latentdiffusion ​ \"Science fiction illustration of Beauty and monsters | impressionism\" by latentdiffusion ​ \"a painting of a girl with a fox sitting in a field at sunrise in the style of Claude Monet\" by latentdiffusion submitted by /u/ghosthamlet [link] [comments]  ( 4 min )
    [D] Questions about Intel 12th gen Alder Lake CPUs
    I am looking to build a new PC but have struggled to find the info on how Intel's latest CPUs perform for data science/ML, so if anyone is using one for that purpose and can help with one or more of these questions it would be very helpful! Apologies if these questions should be directed elsewhere. I am planning to use WSL2/Ubuntu but have heard that Intel's thread director isn't implemented well yet in Linux (or Windows 10!), so it doesn't assign tasks properly. Has anyone experienced issues with this firsthand? Assuming the thread director is working, are the e-cores utilised at all in any typical DS workflows? E.g. will they get used with joblib or when training scikit-learn/gbms in parallel? Are the e-cores good enough to handle other stuff like web browsing etc whilst the p-cores are maxed out on model training, or is it still necessary to keep at least one p-core free to avoid crashing the PC? Also I have read that in Windows 11 (where the thread director works best) that the active window/tab could be assigned p-cores as a priority, which isn't very helpful for someone who needs to train models in the background etc, but not sure whether this is actually happening in practice. The consensus from benchmarks/reviews is that the hybrid architecture 'just works' and is superior to AMD right now, but those benchmarks are primarily for use in gaming/video editing. submitted by /u/FightingLikeBeavers [link] [comments]  ( 1 min )
    [N] The new Machine Learning Specialization by DeepLearning.AI and Stanford Online is launching soon! Join the Waitlist.
    We’re thrilled to announce a brand new Machine Learning Specialization, in collaboration with DeepLearning.AI, launching in June on Coursera! Learn essential real-world skills from AI pioneer Andrew Ng, who co-founded Google Brain and Coursera, led AI research at Baidu, and has impacted millions of AI learners. This updated 3-course Specialization will cover the latest machine learning techniques as well as foundational AI concepts that made its predecessor one of the world’s most popular machine learning courses. Join the waitlist! https://preview.redd.it/yujr31t6vku81.png?width=5000&format=png&auto=webp&s=0f4c4ef090bcdc7cfb04ee2c817d766f23c236a6 submitted by /u/Stanford_Online [link] [comments]  ( 1 min )
  • Open

    How Meta's multiverse could prove our universe is a fake
    submitted by /u/estasfuera [link] [comments]
    SingularAgent - Many Methods Make Light Work
    submitted by /u/dantheman333 [link] [comments]
    General AI In Healthcare | Machine Learning For Cardiovascular Disease | Color Night Vision
    submitted by /u/getrich_or_diemining [link] [comments]
    A realistic image AI software
    submitted by /u/Eurokiwiboy [link] [comments]
    Ant colony simulation
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    Is there any free open source AI model available for answering any bible related queries?
    A few years back I developed a very simple app just to show a few bible verses. Though it is a very simple app, it got more than 50K installs without much promotion. So, I am thinking about promoting it. But hesitate to do it as it is very simple app. So, I would like to add some useful feature before start promoting it. I would like to add a feature which will allow the users to ask any question related to bible, and it should be giving relevant answer. I assume that some bible data is open source. Is there any free tutorial available to know about how to implement AI based chat system for answering any bible related queries after training with bible data. Is there any app already providing this feature? submitted by /u/qptbook [link] [comments]  ( 1 min )
    Today, AI is becoming ubiquitous, in and out of the workplace. With artificial intelligence (AI) becoming more powerful, the questions that surround AI ethics are becoming more relevant.
    But can technology be controlled to avoid adverse outcomes? Let's understand how AI will help us to make a better world. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]  ( 1 min )
    Top Ethical Challenges in AI – The Price of Progress
    submitted by /u/JencyJane [link] [comments]
    Artificial Nightmares: Dr. Strange || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Weekly China AI News: Chinese Prominent AI Lab Plagiarizes Big Model Paper; Microsoft Research Asia Halts Internship Hiring from US-Banned Universities; Beijing Announces New RISC-V Chip Institute
    submitted by /u/trcytony [link] [comments]  ( 1 min )
  • Open

    French palindromes and Morse code
    I got an email from a student in France who asked about a French counterpart to my post on Morse code palindromes, and this post is a response to that email. Palindromes A palindrome is a word that remains the same when the letters are reversed, like kayak. A Morse code palindrome is a word […] French palindromes and Morse code first appeared on John D. Cook.  ( 2 min )
    Blaschke factors
    Blaschke factors are complex functions with specified zeros inside the unit disk. Given a complex number a with |a| < 1, the Blaschke factor associated with a is the function Notice the semicolon in b(z; a). This is a convention that a few authors follow, and that I wish more would adopt. From a purely […] Blaschke factors first appeared on John D. Cook.  ( 2 min )
  • Open

    AI Application Development Guide for Business Owners
    To start deeply investigating the AI app development process, it’s important to first understand how these projects differ from regular app…  ( 9 min )
  • Open

    Neural Network gets too large and dies
    Hi, I've been working on a project for my computer science class and everything has been working up until the training. I'm following a guide online that has worked fairly well. Whenever I try to train, however, I run into an overflow error and the entire network dies. I'm not sure where to go from here as I've tried a few steps to fix the issue, if anyone could offer up some advice to fixing my problem that would be amazing. submitted by /u/djm710 [link] [comments]  ( 2 min )
    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated) -
    submitted by /u/maneesh123456 [link] [comments]
    Question about Sigmoid and Heaviside
    I read a paper and was a little bit confused: In the paper it said: "Imagine you have a two dimensional (binary input) classification (0 or 1) problem and you use Sigmoid as an acitivation function. Since the Sigmoid gives you a real number between 0 and 1, it's not really classification anymore. Therefore you take the input of Sigmoid (y_Sigmoid) and put this into a modified heaviside function H(y-0.5) (so for y_Sigmoid bigger than 0.5, it gives you yHeavi = 1) The decision boundary is given by a straight line w1a1+w2a2+w0=0 and this whole process, it only works with the Sigmoid function as first activation function." The last paragraph confused me. Why can I assume that the decision boundary is exactly that (It's just the "normal decision boundary" for a SLP, why does it work here a also) and why does it work with only Sigmoid Function as first activation function submitted by /u/LawlHeyman [link] [comments]  ( 1 min )
    Starting a neural network
    I want to create a program that can take music i feed it and over time create its own music based on the inputs. I know i have to use a neural network and deep learning algorithms but how do i get started. Thanks. submitted by /u/Saxy-Snark [link] [comments]  ( 2 min )
  • Open

    Offline RL Made Easier: No TD Learning, Advantage Reweighting, or Transformers
    A demonstration of the RvS policy we learn with just supervised learning and a depth-two MLP. It uses no TD learning, advantage reweighting, or Transformers! Offline reinforcement learning (RL) is conventionally approached using value-based methods based on temporal difference (TD) learning. However, many recent algorithms reframe RL as a supervised learning problem. These algorithms learn conditional policies by conditioning on goal states (Lynch et al., 2019; Ghosh et al., 2021), reward-to-go (Kumar et al., 2019; Chen et al., 2021), or language descriptions of the task (Lynch and Sermanet, 2021). We find the simplicity of these methods quite appealing. If supervised learning is enough to solve RL problems, then offline RL could become widely accessible and (relatively) easy to implemen…  ( 5 min )
  • Open

    DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs
    I just moved. I’d like to say that I was highly organized, that I knew where every box ended up and what was in each box. I would be lying. Most people who move know the feeling of living in boxes even after the movers have left, the days spent dodging labyrinths of teetering cardboard,… Read More »DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs The post DSC Weekly Digest 4/19/2022: The Case for Personal Knowledge Graphs appeared first on Data Science Central.  ( 4 min )
    How Microsoft Power BI Revolutionizes Business
    As cloud-based business intelligence becomes more and more popular in the market, one name has made quite a mark: Power BI. A Microsoft offering, Power BI is an interactive data visualization and analytics tool that promises to revolutionize business. Here are some of its key benefits to help you see how it can do that:… Read More »How Microsoft Power BI Revolutionizes Business The post How Microsoft Power BI Revolutionizes Business appeared first on Data Science Central.  ( 3 min )
    How AI and ML are transforming data quality management?
    Introduction In recent years technology has become prominent, both at work and at home. Machine learning (ML) and Artificial Intelligence (AI) are evolving quickly today. Almost everyone will have some interaction with a form of AI daily. Some common examples include Siri, Google Maps, Netflix, and Social media (Facebook/Snapchat).AI and ML have popularly used buzzwords… Read More »How AI and ML are transforming data quality management? The post How AI and ML are transforming data quality management? appeared first on Data Science Central.  ( 4 min )
    Agile, Agile 2 and Agility, Part II
    In the previous article in this series, we discussed the difference between Agile and business agility and how Agile 2 addresses some of the omissions and failings of traditional Agile.  Both Agile and Agile 2 focus on accelerating digital development; however, the benefits of any Agile approach can be obviated if it is not implemented… Read More »Agile, Agile 2 and Agility, Part II The post Agile, Agile 2 and Agility, Part II appeared first on Data Science Central.  ( 4 min )
  • Open

    Search for knowledge in Quip documents with intelligent search using the Quip connector for Amazon Kendra
    Organizations use collaborative document authoring solutions like Salesforce Quip to embed real-time, collaborative documents inside Salesforce records. Quip is Salesforce’s productivity platform that transforms the way enterprises work together, delivering modern collaboration securely and simply across any device. A Quip repository captures invaluable organizational knowledge in the form of collaborative documents and workflows. However, finding […]  ( 6 min )

  • Open

    Do regulatory data projects really need design-time data lineage? Probably not.
    Your regulatory data project likely has no use case for design-time data lineage. tl/dr Mapping Data Lineage at design time, for its own end, has no regulatory use case or ROI.  Buying a specialist tool to support that mapping has even less ROI.  Regulations see that kind of documentary data lineage as ancillary at best.… Read More »Do regulatory data projects really need design-time data lineage? Probably not. The post Do regulatory data projects really need design-time data lineage? Probably not. appeared first on Data Science Central.  ( 10 min )
    Dark Energy, Dark Data
    During the 1990s, the physics community began to measure the brightness of certain supernovae in a novel way. This new method supported the conclusion Edwin Hubble had first arrived at in 1929 after discovering that galaxies are becoming more and more distant from us: Dark matter and dark energy play a role in why those… Read More »Dark Energy, Dark Data The post Dark Energy, Dark Data appeared first on Data Science Central.  ( 4 min )
    5 Main Benefits of Distributed Cloud Computing
    According to the predictions of Garter, by 2024, distributed cloud computing opportunities will be offered by most cloud vendors on a service basis. With the increasing rush in the cloud space and digitalization of documentation, this industry is bound to grow. Understanding Distributed Cloud Distributed cloud is an innovation to traditional cloud computing. It means… Read More »5 Main Benefits of Distributed Cloud Computing The post 5 Main Benefits of Distributed Cloud Computing appeared first on Data Science Central.  ( 5 min )
  • Open

    Speeding Up AI Algorithms- Inferencing challenges at the edge
    submitted by /u/Chipdoc [link] [comments]
    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    What if You Are a Prototype for the Ultimate Sentient Artificial Intelligence?
    submitted by /u/IndependenceFun4627 [link] [comments]  ( 1 min )
    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    There are so many crappy chatbots, cause people don't pay attention on how it's performing. If you're one of them, here are metrics to keep in mind
    Hi there! Chatbots are not the "set and forget" thing like many other software. If you want to achieve great results with your chatbot, you need to improve it constantly. To know where and what to improve, you need to track and monitor chatbot analytics and the main chatbot metrics. General chatbot metrics Total number of users User satisfaction Accuracy of the chatbot Engagement metrics Active users New users Conversation Length Retention Rate Bounce Rate Flow Completion Rate Conversational analytics Goal Completion Rate (GCR) Fallback Rate Human Takeover Rate * Bonus: Revenue metrics Revenue generated ROI / payback period Here in the article we covered how to calculate each metrics, and you can find needed metrics depending on the industry you working in https://botscrew.com/blog/chatbot-metrics/?utm_source=RedditArtificial&utm_medium=&utm_campaign=&utm_term=&utm_content= submitted by /u/Avandegraund [link] [comments]  ( 1 min )
    Wake-up Call for Science – AI System Develops 40,000 Chemical Weapons in 6 Hours
    submitted by /u/TheCnt23 [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]
    Stopping 'them' from spying on you: New AI can block rogue microphones
    submitted by /u/KelliaMcclure [link] [comments]  ( 2 min )
    which courses are good for complete beginners?
    Hello everyone , can someone recommend me for some good courses to do , I saw some courses on udemy , this one worth it? https://www.udemy.com/course/artificial-intelligence-az/ or I can learn everything on youtube? there are few more on udemy but I don't know how good they are .. is it worth buying one of those or there are better videos on youtube? EDIT : I found another 4 courses : https://www.udemy.com/course/100-days-of-code/ https://www.udemy.com/course/complete-python-bootcamp/ https://www.udemy.com/course/python-for-data-science-and-machine-learning-bootcamp/ https://www.udemy.com/course/machinelearning/ Which one of them would you recommend the most? submitted by /u/Edrixor [link] [comments]  ( 1 min )
    AI will make us dumb: [2204.07888] AI, Ageing and Brain-Work Productivity: Technological Change in Professional Japanese Chess
    submitted by /u/kg4jxt [link] [comments]  ( 1 min )
    Any good resources to learn Default Theory?
    I am having a difficult time understanding Default Theory and the various methods e.g Makinson to find the extension of default theories submitted by /u/cocag13996 [link] [comments]
    I know that the voice in this video is made using Replica Studio's engine, but does anyone know which voice exactly was used?
    This I looked through the available ones, not a single one seems to match it. Sorry if this isn't the right sub to ask, but since Replica Studios doesn't have its own sub I don't know where submitted by /u/AxySmarts [link] [comments]  ( 1 min )
    Artificial Nightmares: Schizophrenia || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    [D] Resources for Images Anomaly Detection
    Hello all, I know that there is a lot going on this field. I would like to get started on it, study more.. And as always, I like to start from the basis. Do you have any resource (video, article, book) good to star with? I know there are Autoencoders and Statistical models.. But how to know more, where/how do you keep studying? submitted by /u/bollolo [link] [comments]  ( 1 min )
    [R] Where can I find case studies on different ML projects?
    I am working on my research paper and would like to find resources which show the case studies of ML projects from the beginning to the end, doesn't matter if it failed or succeeded. submitted by /u/mkonu [link] [comments]
    [D] Create Labels for Data created by a GAN
    Hello there! I hope you have a great day! Currently I want to compare how good multiple GANs (Vanilla GAN, WGAN, DCGAN, ...) are for a given use case. Therefore I trained the various GAN versions with data of two different classes (i.e. apple and banana). Now I want to show that data I generate with the Generator can be used to train i.e. a classifier that can distinguish between real images of apples and bananas. Can I somehow create labels for the data I generate with the Generator in a smart way? So that I know that a generated image of the generator should for example be an apple? How do i do that? submitted by /u/Bonkikong [link] [comments]  ( 1 min )
    [P] Luminide: new optimization Early Ranking achieves higher accuracy AI models
    Luminide introduces a new optimization, called Early Ranking, which makes it easier to build better AI models. Early Ranking achieves the same AI training results with up to 10x less compute – this saves time, reduces costs, and increases model accuracy. Luminide's IDE is a customized version of JupyterLab with integrated AI dev tools. Luminide used Early Ranking to place Top 1% in the CVPR Plant Pathology Kaggle competition. You can read about how we developed our winning model, and you can too, in our new blog post: Better Automation for Higher Accuracy AI Models. Class activation maps give insights into Luminide's winning model. Luminide is a new cloud platform for AI model development. Check out our demo video for a quick overview, or try it for yourself (sign up today and receive 100 hours of free GPU cloud compute). submitted by /u/LuminideInc [link] [comments]  ( 1 min )
    [D] generic discussion on freelance ML engineers
    Hi, reddit. Recently, I'm looking into freelancer career path. Currently, I'm a researcher at a top company. So far, I know there is toptal, upwork, and freelancers. Checked them out, and seems toptal you still end up working for large corporate and mostly end up as full-time contractor which is not really a different or better option than my current work. Freelancers has too many bidders from developing countries. Besides what platform to use, i have more questions in terms of what obstacles we are facing to be freelancer ML engineer? Even though i am in AI and a researcher, but i have never deployed a model in production. Usually a task at big company takes a team or multiple teams to complete the MLOPs lifecycle, how can you do it as a single person? Any sharing of experience would be of great help. submitted by /u/meame2010 [link] [comments]  ( 2 min )
    [R] Looking for AI/ML experts from Southeast Asia to interview for master thesis
    Hello everyone, I am a student from Germany writing my master thesis on Digital Transformation in ASEAN with AI/ML. For my thesis I would like to interview AI/ML experts from the ASEAN region to talk about the digital development of each country, challenges and potentials. (If you are not native there, but you have a work connection or just knowledge about the region and its AI development, I appreciate that as well.) It would be awesome if some of you were open to talk to me. A few sentences are enough, I won't take much of your time. If you want, we can do a video call as well. I will quote you of course. Thank you guys. submitted by /u/BlueLagoon357 [link] [comments]  ( 1 min )
    Dealing with numerically 0 likelihood in probabilistic models [R]
    I'm trying to find literature on solving the following issue: In most probabilistic ML models, we model the joint distribution over a set of random variables, p(x1, ..., xN). If N is very large (e.g. 100, 500, or even 1000), then regardless of how you model this, the distribution's highest point of density is still quite tiny. E.g. if you consider an isotropic multivariate gaussian of 100 dimensions, the highest point of density will be somewhere in the neighbourhood of 1.6e-40. So when it comes time to evaluate log likelihood for a model like this, the probability is numerically 0, so the log probability goes to negative infinity. ​ Is there work around solving these kinds of issues? I.e. by constraining the model in some way, or scaling model output, etc? I've done some googling, but am having a hard time finding papers on the subject. Not even sure what to call the problem... Curse of dimensionality in PGMs? ​ Any recommendations of papers / talks / etc is greatly appreciated! submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [R][P] GAN-Control: Explicitly Controllable GANs + Gradio Web Demo
    ​ https://i.redd.it/v61jw1fekiu81.gif Abstract: We present a framework for training GANs with explicit control over generated facial images. We are able to control the generated image by settings exact attributes such as age, pose, expression, etc. Most approaches for manipulating GAN-generated images achieve partial control by leveraging the latent space disentanglement properties, obtained implicitly after standard GAN training. Such methods are able to change the relative intensity of certain attributes, but not explicitly set their values. Recently proposed methods, designed for explicit control over human faces, harness morphable 3D face models (3DMM) to allow fine-grained control capabilities in GANs. Unlike these methods, our control is not constrained to 3DMM parameters and is extendable beyond the domain of human faces. Using contrastive learning, we obtain GANs with an explicitly disentangled latent space. This disentanglement is utilized to train control-encoders mapping human-interpretable inputs to suitable latent vectors, thus allowing explicit control. In the domain of human faces we demonstrate control over identity, age, pose, expression, hair color and illumination. We also demonstrate control capabilities of our framework in the domains of painted portraits and dog image generation. We demonstrate that our approach achieves state-of-the-art performance both qualitatively and quantitatively. submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] Who funds the leading conferences in the field?
    I know that the publishers of the leading journals are mostly for-profit organization, that is weird because as researchers in the field we really “volunteer” for a free peer review or even pay to publish papers and read papers. On the other hand, i wasnt able to find information about the funding and profit goals of the leading conferences. Take NeuroIPS for example, i found that it is organized by “NeurIPS Foundation” but what exactly is this foundation - i couldn’t find any information about this subject. My point is, if the conferences are non-profit, sounds like they should be preferred over funding a for-profit organizations. submitted by /u/Careful_Winner_2335 [link] [comments]  ( 1 min )
    [D] NLP has HuggingFace, what does Computer Vision have?
    I've been writing tutorials with Pinferencia and HuggingFace. HuggingFace is quite handy and easy to use. I want to write some tutorial about computer vision afterwards. Is there anything similar in Computer vision area? submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [D] Why no paper in Speech Emotion Recognition train on multiple datasets ?
    I took a look at multiple of them and I was curious why they seemed to benchmark on multiple datasets but for the training, they restrained themselves to only 1 for training instead of merging them. From that they get good scores on the one they trained on, but bad ones for the rest. submitted by /u/raysamram [link] [comments]  ( 1 min )
    [P] SparseServer.UI : A UI to test performance of Sparse Transformers
    You can now load multiple transformers (each model has a unique sparsification recipe) on top of the DeepSparse server behind Streamlit, and it's open-source. This was battle tested on a 16GB of RAM with only 4 core CPU virtual machine. These compute requirements are enough to load up to 19 sparse BERT models in memory and compare their performance on question answering (P.S. they are really fast on just CPUs). 💻code: https://github.com/neuralmagic/deepsparse/tree/main/examples/sparseserver-ui submitted by /u/Quantum_Stat [link] [comments]  ( 1 min )
    [Research] Learning with Signatures
    This paper reports "results on AFHQ dataset, Four Shapes, MNIST and CIFAR10 achieving 100% accuracy on all tasks." The authors used few-shot classification "by comparing each test sample (after optional augmentation and computation of the element-wise mean) against a representative element-wise mean signature computed by averaging the signatures of a given number of train samples." What are your thoughts on this? Learning with Signatures - https://arxiv.org/abs/2204.07953 submitted by /u/Marmadelov [link] [comments]  ( 1 min )
    [Project] [Research] Simple Speech Recognition System
    Github - Bangla Spoken Number Recognition Dataset - Our custom dataset on Bangla Numerals Publications - Though its on (0-9) digits We have created a simple speech recognition system for recognizing Bangla numerals from '০-৯৯'(0-99). In this project, audio samples from different genders, age groups, and dialects of Bangladeshi people were used to create a speech dataset of spoken numbers from '০-৯৯'(0-99). The raw speech data is subjected to various audio augmentation techniques such as time shift, speed tuning, background noise mixing, and volume tuning. Then, to extract meaningful features from the data, Mel Frequency Cepstrum Coefficients (MFCCs) are used. We have used, Convolutional Neural Networks (CNNs), to develop a Bangla number recognition system. The proposed method recognizes '০-৯৯'(0-99) Bangla spoken numbers with 89.61% accuracy across the entire dataset. The model’s effectiveness was also tested using 10-fold cross-validation, with 89.74% accuracy for recognizing '০-৯৯'(0-99) Bangla spoken numbers across the entire dataset. I Hope, this work will help you in some way. :) submitted by /u/PIASR0Y [link] [comments]  ( 1 min )
    [P] Improving mulitclass classification accuracy with Jain's Fairness Index
    This is a light implementation of the idea in the paper Leveraging Uncertainties in Softmax Decision-Making Models for Low-Power IoT Devices. Instead of finding uncertainties I have added Jain's Fairness Index as a addition to the loss function. Gist: https://gist.github.com/Gananath/8d167384da7d3bc078650c73fab1a8dd submitted by /u/gananath [link] [comments]
    [D] Are workshop papers considered "final publications"?
    Specifically, I'm talking about workshops of major conferences (NeurIPS, ICLR, ICML, etc.). If I submit a paper and it gets accepted, is that workshop paper a "final publication"? Or would most people expect the project to continue being developed into a slightly larger/longer paper for submission to the main stream of a conference? And if so, does publishing the earlier workshop paper tend to hinder or harm the later conference submission? I recognise there's a variety of workshops, and perhaps each have different expectations or norms. I'm wondering, from my outsider's perspective, how can I tell? For example, I have been thinking about submitting to one of these ICML workshops: https://icml-compbio.github.io/ or https://www.tagds.com/workshops/tag-in-machine-learning. Is there an easy way to tell whether either or both of these are "final publication" venues or not? submitted by /u/tfburns [link] [comments]  ( 2 min )
    [R] Maximum likelihood estimation can fail due to "Manifold Overfitting"
    arXiv: https://arxiv.org/abs/2204.07172 This paper out today seems to make the bold claim that maximum likelihood estimation is not a well-posed training objective in deep generative modelling. The manifold hypothesis says that observed high-dimensional data clusters around low-dimensional manifolds, but maximum likelihood methods (e.g. VAE, normalizing flows) learn high-dimensional densities. The paper argues that the mismatch between dimensionalities will lead to a problem called "manifold overfitting". Models are able to maximize likelihood in high-dimensions by sending the density to infinity around the low-dimensional manifold, but they can do this while completely ignoring the distribution of data on the manifold. So in other words, high capacity models will learn the data manifold…  ( 5 min )
  • Open

    "Reinforcement Learning with Action-Free Pre-Training from Videos", Seo et al 2022
    submitted by /u/gwern [link] [comments]
    "Inferring Rewards from Language in Context", Lin et al 202
    submitted by /u/gwern [link] [comments]
    Bandit problems as sequential decision problems
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process (need to model the state carefully). An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book (https://castlelab.princeton.edu/RLSO/) for a complete summary of the four classes of policies for pure learning problems (aka bandit problems). Note that Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Getting started with UAV/drone control
    Hi, is it currently possible to train a UAV and implement the policy it in real-life? I understand there are different environments for training, e.g. AirSim, GymFC, and others. However, the interesting part for me is the link to the real world: Is there a way to directly implement any learned policy on a real drone, e.g. a commercially available quad-copter? Which UAV would support such a functionality? I'd love to get started on training drones for RL purposes (search and rescue, etc), but if there is no way to test it in real-life then this would be disappointing. submitted by /u/FrankTheThanks [link] [comments]  ( 1 min )
    Question about Expected Sarsa for prediction vs control
    I am having a hard time figuring out what makes the difference between Expected Sarsa for prediction vs for control. For off-policy Expected Sarsa I believe it's possible to use one epsilon value for a target policy that is epsilon-greedy and another epsilon value for a behaviour policy that is epsilon-greedy. The target policy would be used within the expected value calculation in the update of Q(S,A), the action value function, and the behaviour policy would be used to choose actions from the current state. But I'm not sure how to differentiate between the control version of the algorithm compared to the prediction version though. I think prediction usually finds the state-value function but I know that on-line Sarsa for prediction uses Q(S,A) so I'm not sure how to determine the difference between prediction and control algorithms. submitted by /u/lifelifebalance [link] [comments]  ( 1 min )
    exploration strategies in discrete action spaces
    Hello there, I am working on missile command game, and as a baseline I mostly use rllib/ppo. The algorithm never converges, I suspect it is because of the lack of exploration. Since the timesteps are small, the target usually oscillates around center of the screen, it is impossible to explore to go near the border and then explore to fire (to counter incoming missile). What methods should I try? Moreover, I have already done reward scaling and frame staking. Any suggestions regarding solving this game is much appreciated. Last question, do you now similar (and common) environments that is solved, maybe solutions show the path to follow.Thank you :) submitted by /u/Street_Excitement_14 [link] [comments]  ( 1 min )
    Confusion of hyperparameters in ppo
    I'm reading the ppo paper https://arxiv.org/abs/1707.06347 and I'm confusing about the hyperparameters in table 4, Log stdev. of action distribution | LinearAnneal(-0.7, -1.6). Best to my knowledge, under the continuous setting, the policy will output mean and std, so why the stdev of action distribution is given as a hyperparameter, and also what is LinearAnneal in detail. submitted by /u/StrawberryTemporary7 [link] [comments]  ( 1 min )
    Need help about categorical dqn
    I dk how the projection of TZ to match Z work and I also dont understand the formula? can someone do step by step calculation to demo? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
  • Open

    Overview of Relational Graph Convolutional Networks (RGCN)
    submitted by /u/aidev2040 [link] [comments]
    This is a long shot, but does anyone remember...
    Hi, this is a very long shot. I have been trying to remember the name of a science TV show which aired in the UK back in the 90's. It focussed on Neural Networks and gave some brilliant examples of environmental sensing. There was also a section showing a simple voice synthesiser which "babbled" like a child. I thought it may have been an "Horizon" show, however, I have been through the list of shows from that time and none appear to be right. If anyone has a memory of this show please let me know. One of the visuals I remember was a plastic skull with an LED matrix inside showing patterns. Obviously this was just some smoke and mirrors, however, it may trigger a memory. I'm trying to recall something from best part of 30 years ago.. submitted by /u/_m0xya_ [link] [comments]  ( 1 min )
  • Open

    10 seats remaining | A series of live ML strategy workshops
    Sponsored Post Unlike traditional online courses, Foster Provost’s workshops will give you the chance to engage live with a world-class […] The post 10 seats remaining | A series of live ML strategy workshops appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    Learning to Prompt for Continual Learning
    Posted by Zifeng Wang, Student Researcher, and Zizhao Zhang, Software Engineer, Google Research Supervised learning is a common approach to machine learning (ML) in which the model is trained using data that is labeled appropriately for the task at hand. Ordinary supervised learning trains on independent and identically distributed (IID) data, where all training examples are sampled from a fixed set of classes, and the model has access to these examples throughout the entire training phase. In contrast, continual learning tackles the problem of training a single model on changing data distributions where different classification tasks are presented sequentially. This is particularly important, for example, to enable autonomous agents to process and interpret continuous streams of informati…  ( 7 min )
  • Open

    Integrate ServiceNow with Amazon Lex chatbot for ticket processing
    Conversational interfaces (or chatbots) can provide an intuitive interface for processes such as creating and monitoring tickets. Let’s consider a situation in which a recent hire on your team is required to cut tickets for office equipment. To do so, they have to interact with a ticketing software that the organization uses. This often requires […]  ( 10 min )
  • Open

    Inversion in a circle
    Inversion in the unit circle is a way of turning the circle inside-out. Everything that was inside the circle goes outside the circle, and everything that was outside the circle comes in. Not only is the disk turned inside-out, the same thing happens along each ray going out from the origin. Points on that ray […] Inversion in a circle first appeared on John D. Cook.  ( 2 min )
  • Open

    Don’t let data drift derail edge compute machine learning models
    Edge computing has come of age, with deployments enabling many applications that process data from IoT sensors and cameras. In 2017, we identified the symbiotic relationship between edge computing and video analytics in an article, noting that live video analytics is the “killer app” for edge computing. Edge devices come in various shapes and sizes […] The post Don’t let data drift derail edge compute machine learning models appeared first on Microsoft Research.  ( 5 min )
  • Open

    A Quick Guide To Find The Right Minds For Annotation Is So Famous, But Why?
    Shared duties have always been the most critical component of every successful organization, regardless of its nature or size. When it…  ( 4 min )
  • Open

    Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques
    Creating content is no longer tethered to using paint and stone as mediums, nor being in massive studios. Visual art can now be created anywhere, anytime. But being creative is still challenging and time-consuming. NVIDIA is making artistic workflows easier and faster by giving creators tools that enable them to remain in their flow state. Read article > The post Welcome ‘In the NVIDIA Studio’: A Weekly Celebration of Extraordinary Artists, Their Inspiring Art and Innovative Techniques appeared first on NVIDIA Blog.  ( 4 min )

  • Open

    whats your hopes and worry about future humaniod Artificial intelligence coming soon?
    submitted by /u/Upset_Force66 [link] [comments]
    AI Dream 36 - Psychedelic Special (4K 40Mbit Test)
    submitted by /u/LordPewPew777 [link] [comments]
    AI Startups and the Hunt for Tech Talent in Vietnam
    submitted by /u/regalalgorithm [link] [comments]
    We don't have echolocation
    submitted by /u/tezdhar [link] [comments]
    These 3-Michelin-starred plates were invented by AI. The food doesn’t even exist
    submitted by /u/jonfla [link] [comments]
    Getting in shape while homeworking by force locking the screen and using blazepose pose estimation to detect pushups to unlock it again.
    submitted by /u/ThePyCoder [link] [comments]
    Last Week in AI: AI chip startup funding doubled in the last 5 years, new AI applications in hospitals and restaurants, Cruise robotaxi pulled over by police in SF, and more!
    https://lastweekin.ai/p/163?s=w submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Youtubers create a completely AI "influencer."
    submitted by /u/savetheattack [link] [comments]
    FOMO is a TinyML neural network for real-time object detection
    submitted by /u/bendee983 [link] [comments]
    An online course with an AI tutor achieves a significantly higher completion rate than traditional online courses thanks to a personalized learning experience.
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Protein Folding Neural Networks (e.g RoseTTAFold) Are Not Robust
    submitted by /u/qptbook [link] [comments]
    Witch of the Barthe
    submitted by /u/Hacknaut [link] [comments]
    Why is it called tensorflow and not matrixflow?
    Hello, I'm MB. A very nice and polite guy. Why is it called tensorflow and not matrixflow? AI is all about matrix multiplications, right? So why use the word tensor instead? I know what a tensor is, kind of. But isn't AI about matrix multiplications primarily rather than tensor multiplications. ELI5 please. submitted by /u/MountBlanc [link] [comments]  ( 2 min )
    Society
    submitted by /u/booksmoothie [link] [comments]
    Bioinspired multisensory neural network with crossmodal integration and recognition
    submitted by /u/booksmoothie [link] [comments]
    Realistic animal movement
    I am working on a robotic pet that has lots of movement capability but is simply scripted and will unnaturally jump between movement sets without considering the current movement. What branch of AI should I look into leaning about? Currently I use mostly python for high level and C for microcontrollers. submitted by /u/uMinded [link] [comments]
  • Open

    A3C vs federated learning?
    Hi, I see this question was asked before but I am still not convinced there is a difference between the two. How is asynchronous distributed RL (A3C) and federated learning different? It seems like the basic idea behind them is the same— the agents train in their own environments and only share gradients with the server. Is the difference only in terms of the domain they are applied in? Is it just ML vs RL? submitted by /u/uneasy_daisy [link] [comments]  ( 1 min )
    Can polyak averaging neural networks lead to numerical instability?
    In Soft Actor Critic several Q networks are used. Target Q networks are gradually updated to match other Q networks. See step 15 here: https://spinningup.openai.com/en/latest/algorithms/sac.html#pseudocode I've heard this called polyak averaging. Let's say we have two weights from two neural networks: W1 from one network, and W2 is the corresponding weight from the other network. Polyak averaging averages these weights as follows: W_average = W1 * p + W2 * (1-p) When p is 0.5, it's a evenly weighted average. If p is high, then W1 is weighted more heavily than W2, etc. My question is: Does this method of averaging weights lead to numerically unstable neural networks? This technique is often used to gradually transform one neural network into another on a weight by weight basis, but there is no guarantee that all intermediate neural networks are well behaved (at least, none that I'm aware of). Whereas, gradient descent with small enough step sizes should, theoretically, keep a neural network well behaved, I think those same theoretical guarantees apply to polyak averaging neural networks. What do you think? submitted by /u/Buttons840 [link] [comments]  ( 2 min )
  • Open

    [D] Word Meaning Dictionary Dataset
    Hey all! So I intend to make an application that, very naively speaking, outputs synonyms of a given word regardless of context (like if word1 is "bank", the model should output both "money" and "river", and the order does not matter). For this, I intend to use a Doc2Vec type of classifier, where the meanings of each word can serve as a document, and then similar words can easily be returned using a cosine similarity function. I chose this over a classic Word2Vec as this will be able to predict uncommon words (which blimey the English language has a lot of) which would otherwise be processed as tokens. To this end, I am searching for a suitable dataset. Any ideas? submitted by /u/GrammarPaparazzi [link] [comments]  ( 1 min )
    [D] Is there a way to use a series of videos as the predictor variable for prediction/regression?
    This is the problem area I am working with: I have a series of videos taken at different times, and each video is paired with a physical variable. The videos contain information that correlates with the physical variable. What we want to do is use the information encoded within each video to build a correlation model with the physical quantity, and thereafter use new videos to predict the physical quantity. (We want to avoid the route of video -> CNN -> extract parameters -> build model with parameters. Instead, we want to directly go from the videos to the model without separately extracting parameters.) So, in a way, I want to use a series of videos as a time series data set. Is there a way to do this? What should be the starting point for my research into this? Thanks in advance! I am not an expert with this area at all, and would greatly appreciate guidance from the community. submitted by /u/besse [link] [comments]  ( 1 min )
    [P] Blog post + open-source PyTorch implementation of DeepMind's SIMONe (unsupervised scene decomposition)
    Hi all! My team recently reproduced and published a PyTorch implementation of the paper SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition. Our blog post walks through the code and provides a detailed explanation of the architecture they use in order to perform object segmentation on videos in a fully self-supervised manner. Hope this is helpful/interesting to others! submitted by /u/ai_ellie [link] [comments]  ( 1 min )
    [D] AutoRF vs SinNeRF
    Both approaches seem to be able to render complex scenes from a single view, without the need for explicit priors or pretrained feature extractors. Conveniently, AutoRF doesn't mention SinNeRF. What are the similarities and differences among the two approaches? DISCLAIMER - I'm not a NeRF expert. My limited understanding of it is that we train a small MLP to regress the radiance field for a scene, i.e., to predict emitted radiance at a point (x,y,z) in the viewing direction (θ, φ). Once we have the radiance field, we can use some rendering engine to render a 2D view from the 3D field and the camera parameters. EDIT: I just realized that I didn't link the papers, how silly of me. Here they are: SinNeRF: https://arxiv.org/abs/2204.00928 AutoRF: https://arxiv.org/abs/2204.03593 ​ ​ submitted by /u/Best-Neat-9439 [link] [comments]  ( 1 min )
    [P] Evaluating automatic paraphrasing via BLEU, LaBSE, Perplexity and Jaccard similarity index - how we do it for Linguix Paraphraser 2.0
    Hey everyone! Our NLP team, led by our expert Daria, has recently released a new AI-based paraphrasing feature – Linguix Paraphraser 2.0. To measure its quality, we use four important metrics: BLEU, Jaccard similarity index, LaBSE and Perplexity. Performance stats: BLEU, which is used for measuring the quality of machine translation. The lower it is for rephrase task, the better. Right now, Linguix Paraphraser 2.0 has the BLEU metric of 0.47 (previous iteration had 0.65). So, we can say that our paraphraser is now smarter, it uses more words to rewrite the sentence, but the overall idea of the content is still preserved. Jaccard similarity index is used to measure the likeness of x and y objects. The same as with BLEU, the lower the index for the task, the better. Our current metric is 0.45 compared to 0.51 for the previous iteration. LaBSE metric is used to measure the semantic similarity of two sentences. It translates text into vectors so that vectors of texts close in meaning are geometrically close to each other. The higher the metric, the better. The new model has LaBSE similarity slightly less than the previous model: 0.80 vs 0.93, which is normal and correct, because the model generates a variety of variants using other words, but keeping the meaning of the source text in the target. Perplexity is used to ensure the rewritten content sounds natural (lower perplexity is better). The naturalness of the rewrites generated by our new paraphraser is much better than before: 0.26 vs 4.99 for the prior version. ​ https://i.redd.it/iaaf7o2iibu81.gif As such, for Linguix Paraphraser 2.0 we were able to improve the quality of the rephrased content, while keeping the text meaning at the same level. P.S. Daria is somewhat shy, so I asked her to share the update here on her behalf. Anyway she'll be pleased to see some feedback! submitted by /u/alexlash [link] [comments]  ( 1 min )
    [R] [P] Slideflow: a deep learning framework for digital histology
    Hi all - I'm an applied ML researcher working in an oncology research lab at U Chicago, using digital slides of patient's tumors for tumor classification, prognostication, and treatment response prediction. I'm really excited to share with the community the deep learning tools we've been using, and I'm hoping for any feedback you might have (or direction if you think there's a community or subreddit this might be better suited for). After years of development, we've released our open-source deep learning framework for digital histology, Slideflow (https://github.com/jamesdolezal/slideflow). It has flexible and highly optimized whole-slide image processing, support for a wide variety of existing and custom architectures (with continuous, categorical, or time-series outcomes), real-time digital stain normalization, a number of explainability tools, and integrated uncertainty quantification. It's compatible with both Tensorflow and PyTorch, available on PyPI and DockerHub, and comes with good documentation (https://slideflow.dev/). We've tried out a number of alternative frameworks over the years, and I think the ease of use, flexibility, and performance optimizations set it apart from other repos you'll find on GitHub. We have a handful of local collaborators who are using Slideflow, but I'm hoping to expand our reach and find people in similar fields who are interested in collaborating for ongoing open-source development. I've tried looked for subs relating specifically to computational pathology / digital histology, and haven't found a good community yet - anyone have ideas for how to get connected with like-minded people working in the same field? submitted by /u/shawarma_bees [link] [comments]  ( 2 min )
    [D] Anyone using named tensors or a tensor annotation lib productively?
    It seems like there have been some options out for a while now - e.g. native pytorch named tensors, tsalib, torchtyping - yet I haven't really seen them discussed or used in any code I've come across. Just wondering if anyone has surveyed them recently and is using them. In particular tsalib's warp string syntax for transformations looks really interesting. submitted by /u/patniemeyer [link] [comments]  ( 1 min )
    [D] Are there any analog A.I. computing chips on the retail market yet?
    If so, where to buy them? (for example: I red that mythic has been collecting funding in mid-2021, but I dont know if they are for sale anywhere). submitted by /u/GerritTheBerrit [link] [comments]  ( 2 min )
    [N][R][P] High fidelity 3D face reconstruction from monocular image
    FaceNext is an open source PyTorch library for high fidelity 3D face reconstruction from single/multiple RGB image(s). github.com/abdallahdib/NextFace ​ https://reddit.com/link/u6e7cd/video/ixg0wlzirau81/player submitted by /u/Abd_dib [link] [comments]
    [D] Which keywords describe my task?
    Hey all, I have received a task in an area I am unfamiliar with and need a little help finding suitable papers, so I am looking for keywords. To illustrate the goal, let's say you have 10000 screws (which can be of the same model) and you want to be able to recognize/match each one. You want new screws to be added all the time, so you also want the case that the object could previously be unknown when performing the match. The goal is to develop a capturing system that produces suitable images and to find an architecture/algorithm that is as robust as possible. The object images should be invariant to illumination, rotation and translation during acquisition. It should be a kind of barcode/hash without any additional symbol, based only on the structure of the object. Is there a name for such a task? I think it is not really a classification in the classical sense. I guess it might be just a clever way of finding suitable features for each individual object structure and suitable distance function. Sorry for the long post, I appreciate any help. submitted by /u/Temporary_Lab769 [link] [comments]  ( 1 min )
    [D] Including outer objects in RNN / CNN
    Hello there, Which layer or structure would you append to existing machine learning architectures like yolov5 in order to not only detect the specific object, but also the object which it is part of? Lets say there are xray images of laptops: The laptop itself will be detected and also something like the hard drive or battery inside of it. Is it possible to make the CNN/RNN aware of the fact that the hard drive or battery is inside the Laptop? Hope someone can tell what i mean. Regards David submitted by /u/rohrivibes [link] [comments]  ( 1 min )
    [D] PhD in knowledge representation and reasoning for autonomous agent: research landscape
    I have been offered a PhD in domain of knowledge representation and reasoning for autonomous agents. Goal is to use represent textual rules and world knowledge and then use those represented knowledge for reasoning, so that motion of autonomous agent can be predicted. I have question regarding the current landscape of knowledge representation and reasoning. I see more and more work in data focused model and old Logic and associated paths fading out. Phd project problem itself looks interesting as it focus on work where there will be less need of data and can plan motion in unseen scenarios. But I am concerned about the future career prospective in this domain where this problem is tackled by knowledge representation and reasoning. As I can see there is less and less funding in this domain. What is your take on future landscape of research direction in this domain? submitted by /u/human_treadstone [link] [comments]  ( 2 min )
    [P] My blog on ML model evaluation (Bayes optimal decisions, ROC curve, LLR calibration)
    I have published 3 articles about ML model evaluation on my personal blog. Just finished the 3 installment, so I am keen to share and get some feedback. I cover frameworks traditionally used in ML like ROC curves, but from a Bayes decision perspective, which I have been struggling to find in textbooks/tutorials. The 3rd part is about the evaluation of log-likelihood calibrated models. Hope you will find it interesting/useful! https://mkffl.github.io/2021/10/18/Decisions-Part-1.html https://mkffl.github.io/2021/10/28/Decisions-Part-2.html https://mkffl.github.io/2022/03/02/Decisions-Part-3.html And the underlying code for reproducibility https://github.com/mkffl/decisions submitted by /u/mkffl [link] [comments]  ( 1 min )
    [P] ormb: Docker for Your Models, Help You Manage Models Better
    github.com/kleveross/ormb ormb helps you manage your Machine Learning/Deep Learning models with docker container image registry. It makes your models easy to create, version, share and publish. ``` Save the model in local cache first $ ormb save gaocegege/fashion_model:v1 ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: saved Push the model from local cache to remote registry $ ormb push gaocegege/fashion_model:v1 The push refers to repository [gaocegege/fashion_model] ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB format: SavedModel v1: pushed to remote (1 layer, 162.1 KiB total) Pull the model from remote registry to local cache $ ormb pull gaocegege/fashion_model:v1 v1: Pulling from gaocegege/fashion_model ref: gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB Status: Downloaded newer model for gaocegege/fashion_model:v1 Export the model from local cache to current directory $ ormb export gaocegege/fashion_model:v1 ref: localhost/gaocegege/fashion_model:v1 digest: 6b08cd25d01f71a09c1eb852b3a696ee2806abc749628de28a71b507f9eab996 size: 162.1 KiB View the local file directory $ tree examples/SavedModel-fashion examples/SavedModel-fashion ├── model │ ├── saved_model.pb │ └── variables │ ├── variables.data-00000-of-00001 │ └── variables.index ├── ormbfile.yaml └── training-serving.ipynb 2 directories, 5 files ``` submitted by /u/gaocegege [link] [comments]  ( 1 min )
    [D] Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection
    Hi, I have just published my latest medium article. Anomalies are widespread when it comes to working on data. They become vital in time series. So, It is crucial to propose efficient methods to detect and deal with them. This article illustrates a state-of-the-art model called DGHL for anomaly detection. DGHL includes a ConvNet as a Generator and instead of encoding it maximizes the likelihood with the Alternating Back-Propagation algorithms. https://rezayazdanfar.medium.com/deep-generative-model-with-hierarchical-latent-factors-for-time-series-anomaly-detection-8d6eaebad8bc submitted by /u/rezayazdanfar [link] [comments]  ( 1 min )
    [P] app to play with latent diffusion models
    just published “geni”, a new minimal app that uses Latent Diffusion Models. It will not produce DALL-E-ish results but it’s fast and great for playing with prompt engineering. Also, it’s free. would love to have the community playing with it. check it out here: https://geni.vercel.app submitted by /u/viccpopa [link] [comments]  ( 1 min )
    [R] VQ-Flows: Vector Quantized Local Normalizing Flows
    arXiV: https://arxiv.org/abs/2203.11556   Summary: We introduce a novel statistical framework for learning a mixture of local normalizing flows as "chart maps" over the data manifold. Our framework augments the expressivity of recent approaches while preserving the signature property of normalizing flows, that they admit exact density evaluation. We learn a suitable atlas of charts for the data manifold via a vector quantized auto-encoder (VQ-AE) and the distributions over them using a conditional flow. We validate experimentally that our probabilistic framework enables existing approaches to better model data distributions over complex manifolds.​   GitHub: Coming Soon Author here, happy to answer any questions. submitted by /u/tshrjn [link] [comments]  ( 1 min )
  • Open

    Does anyone have a guess as to why my network isn’t working? (more info in comments)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 1 min )
  • Open

    Guide to Iteratively Tuning GNNs
    Sponsored Post By Luis Bermudez This blog walks through a process for experimenting with hyperparameters, training algorithms and other parameters […] The post Guide to Iteratively Tuning GNNs appeared first on Machine Learning Mastery.  ( 6 min )
    Managing Data for Machine Learning Project
    Big data, labeled data, noisy data. Machine learning projects all need to look at data. Data is a critical aspect […] The post Managing Data for Machine Learning Project appeared first on Machine Learning Mastery.  ( 30 min )
  • Open

    7 Tips for making your code more ‘pythonic’ and elegant
    7 use-cases where you can make your python code more nifty, concise and elegant — without compromising readability. Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 4 min )
  • Open

    AI and Healthcare: AI as a Triaging Tool for Healthcare
    Healthcare offers one of the biggest areas where AI could impact people. AI in healthcare is already widespread but is expected to grow even further. The global artificial intelligence in healthcare market size was valued at USD 10.4 billion in 2021. It is expected to expand at a compound annual growth rate (CAGR) of 38.4%… Read More »AI and Healthcare: AI as a Triaging Tool for Healthcare The post AI and Healthcare: AI as a Triaging Tool for Healthcare appeared first on Data Science Central.  ( 3 min )
    Fallacy of Becoming Data-driven – Part 2: Cultural Transformation
    In my first blog of the series “Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed”, I preached about the critical importance of reframing the conversion away from data-driven to becoming value-obsessed. Instead of focusing on becoming value-driven, organizations need to focus on how to uncover the customer, product, service, and operational insights buried in… Read More »Fallacy of Becoming Data-driven – Part 2: Cultural Transformation The post Fallacy of Becoming Data-driven – Part 2: Cultural Transformation appeared first on Data Science Central.  ( 5 min )
    A Glossary of Knowledge Graph Terms
    As with many fields, knowledge graphs boast a wide array of specialized terms. This guide provides a handy reference to these concepts. Resource Description Framework (RDF) The Resource Description Framework (or RDF) is a conceptual framework established in the early 2000s by the World Wide Web Consortium for describing sets of interrelated assertions. RDF breaks… Read More »A Glossary of Knowledge Graph Terms The post A Glossary of Knowledge Graph Terms appeared first on Data Science Central.  ( 10 min )
  • Open

    Auto-Gait: Automatic Ataxia Risk Assessment with Computer Vision on Gait Task Videos. (arXiv:2203.08215v2 [cs.CV] UPDATED)
    In this paper, we investigated whether we can 1) detect participants with ataxia-specific gait characteristics (risk-prediction), and 2) assess severity of ataxia from gait (severity-assessment) using computer vision. We created a dataset of 155 videos from 89 participants, 24 controls and 65 diagnosed with (or are pre-manifest) spinocerebellar ataxias (SCAs), performing the gait task of the Scale for the Assessment and Rating of Ataxia (SARA) from 11 medical sites located in 8 different states across the United States. We develop a computer vision pipeline to detect, track, and separate out the participants from their surroundings and construct several features from their body pose coordinates to capture gait characteristics like step width, step length, swing, stability, speed, etc. Our risk-prediction model achieves 83.06% accuracy and an 80.23% F1 score. Similarly, our severity-assessment model achieves a mean absolute error (MAE) score of 0.6225 and a Pearson's correlation coefficient score of 0.7268. Our models still performed competitively when evaluated on data from sites not used during training. Furthermore, through feature importance analysis, we found that our models associate wider steps, decreased walking speed, and increased instability with greater ataxia severity, which is consistent with previously established clinical knowledge. Our models create possibilities for remote ataxia assessment in non-clinical settings in the future, which could significantly improve accessibility of ataxia care. Furthermore, our underlying dataset was assembled from a geographically diverse cohort, highlighting its potential to further increase equity. The code used in this study is open to the public, and the anonymized body pose landmark dataset is also available upon request.
    Unconditional Image-Text Pair Generation with Multimodal Cross Quantizer. (arXiv:2204.07537v1 [cs.CV])
    Though deep generative models have gained a lot of attention, most of the existing works are designed for the unimodal generation task. In this paper, we explore a new method for unconditional image-text pair generation. We propose MXQ-VAE, a vector quantization method for multimodal image-text representation. MXQ-VAE accepts a paired image and text as input, and learns a joint quantized representation space, so that the image-text pair can be converted to a sequence of unified indices. Then we can use autoregressive generative models to model the joint image-text representation, and even perform unconditional image-text pair generation. Extensive experimental results demonstrate that our approach effectively generates semantically consistent image-text pair and also enhances meaningful alignment between image and text.  ( 2 min )
    Effects of Multi-Aspect Online Reviews with Unobserved Confounders: Estimation and Implication. (arXiv:2110.01746v2 [cs.LG] UPDATED)
    Online review systems are the primary means through which many businesses seek to build the brand and spread their messages. Prior research studying the effects of online reviews has been mainly focused on a single numerical cause, e.g., ratings or sentiment scores. We argue that such notions of causes entail three key limitations: they solely consider the effects of single numerical causes and ignore different effects of multiple aspects -- e.g., Food, Service -- embedded in the textual reviews; they assume the absence of hidden confounders in observational studies, e.g., consumers' personal preferences; and they overlook the indirect effects of numerical causes that can potentially cancel out the effect of textual reviews on business revenue. We thereby propose an alternative perspective to this single-cause-based effect estimation of online reviews: in the presence of hidden confounders, we consider multi-aspect textual reviews, particularly, their total effects on business revenue and direct effects with the numerical cause -- ratings -- being the mediator. We draw on recent advances in machine learning and causal inference to together estimate the hidden confounders and causal effects. We present empirical evaluations using real-world examples to discuss the importance and implications of differentiating the multi-aspect effects in strategizing business operations.  ( 2 min )
    Multi-domain Integrative Swin Transformer network for Sparse-View Tomographic Reconstruction. (arXiv:2111.14831v7 [eess.IV] UPDATED)
    Decreasing projection views to lower X-ray radiation dose usually leads to severe streak artifacts. To improve image quality from sparse-view data, a Multi-domain Integrative Swin Transformer network (MIST-net) was developed in this article. First, MIST-net incorporated lavish domain features from data, residual-data, image, and residual-image using flexible network architectures, where residual-data and residual-image sub-network was considered as data consistency module to eliminate interpolation and reconstruction errors. Second, a trainable edge enhancement filter was incorporated to detect and protect image edges. Third, a high-quality reconstruction Swin transformer (i.e., Recformer) was designed to capture image global features. The experiment results on numerical and real cardiac clinical datasets with 48-views demonstrated that our proposed MIST-net provided better image quality with more small features and sharp edges than other competitors.
    GCR: Gradient Coreset Based Replay Buffer Selection For Continual Learning. (arXiv:2111.11210v3 [cs.LG] UPDATED)
    Continual learning (CL) aims to develop techniques by which a single model adapts to an increasing number of tasks encountered sequentially, thereby potentially leveraging learnings across tasks in a resource-efficient manner. A major challenge for CL systems is catastrophic forgetting, where earlier tasks are forgotten while learning a new task. To address this, replay-based CL approaches maintain and repeatedly retrain on a small buffer of data selected across encountered tasks. We propose Gradient Coreset Replay (GCR), a novel strategy for replay buffer selection and update using a carefully designed optimization criterion. Specifically, we select and maintain a "coreset" that closely approximates the gradient of all the data seen so far with respect to current model parameters, and discuss key strategies needed for its effective application to the continual learning setting. We show significant gains (2%-4% absolute) over the state-of-the-art in the well-studied offline continual learning setting. Our findings also effectively transfer to online / streaming CL settings, showing upto 5% gains over existing approaches. Finally, we demonstrate the value of supervised contrastive loss for continual learning, which yields a cumulative gain of up to 5% accuracy when combined with our subset selection strategy.
    Learning to Accelerate by the Methods of Step-size Planning. (arXiv:2204.01705v3 [cs.LG] UPDATED)
    Gradient descent is slow to converge for ill-conditioned problems and non-convex problems. An important technique for acceleration is step-size adaptation. The first part of this paper contains a detailed review of step-size adaptation methods, including Polyak step-size, L4, LossGrad, Adam, IDBD, and Hypergradient descent, and the relation of step-size adaptation to meta-gradient methods. In the second part of this paper, we propose a new class of methods of accelerating gradient descent that have some distinctiveness from existing techniques. The new methods, which we call {\em step-size planning}, use the {\em update experience} to learn an improved way of updating the parameters. The methods organize the experience into $K$ steps away from each other to facilitate planning. From the past experience, our planning algorithm, Csawg, learns a step-size model which is a form of multi-step machine that predicts future updates. We extends Csawg to applying step-size planning multiple steps, which leads to further speedup. We discuss and highlight the projection power of the diagonal-matrix step-size for future large scale applications. We show for a convex problem, our methods can surpass the convergence rate of Nesterov's accelerated gradient, $1 - \sqrt{\mu/L}$, where $\mu, L$ are the strongly convex factor of the loss function $F$ and the Lipschitz constant of $F'$, which is the theoretical limit for the convergence rate of first-order methods. On the well-known non-convex Rosenbrock function, our planning methods achieve zero error below 500 gradient evaluations, while gradient descent takes about 10000 gradient evaluations to reach a $10^{-3}$ accuracy. We discuss the connection of step-size planing to planning in reinforcement learning, in particular, Dyna architectures.  ( 2 min )
    Transfer Learning for Instance Segmentation of Waste Bottles using Mask R-CNN Algorithm. (arXiv:2204.07437v1 [cs.CV])
    This paper proposes a methodological approach with a transfer learning scheme for plastic waste bottle detection and instance segmentation using the \textit{mask region proposal convolutional neural network} (Mask R-CNN). Plastic bottles constitute one of the major pollutants posing a serious threat to the environment both in oceans and on land. The automated identification and segregation of bottles can facilitate plastic waste recycling. We prepare a custom-made dataset of 192 bottle images with pixel-by pixel-polygon annotation for the automatic segmentation task. The proposed transfer learning scheme makes use of a Mask R-CNN model pre-trained on the Microsoft COCO dataset. We present a comprehensive scheme for fine-tuning the base pre-trained Mask-RCNN model on our custom dataset. Our final fine-tuned model has achieved 59.4 \textit{mean average precision} (mAP), which corresponds to the MS COCO metric. The results indicate a promising application of deep learning for detecting waste bottles.  ( 2 min )
    Big-means: Less is More for K-means Clustering. (arXiv:2204.07485v1 [cs.LG])
    K-means clustering plays a vital role in data mining. However, its performance drastically drops when applied to huge amounts of data. We propose a new heuristic that is built on the basis of regular K-means for faster and more accurate big data clustering using the "less is more" and MSSC decomposition approaches. The main advantage of the proposed algorithm is that it naturally turns the K-means local search into global one through the process of decomposition of the MSSC problem. On one hand, decomposition of the MSSC problem into smaller subproblems reduces the computational complexity and allows for their parallel processing. On the other hand, the MSSC decomposition provides a new method for the natural data-driven shaking of the incumbent solution while introducing a new neighborhood structure for the solution of the MSSC problem. This leads to a new heuristic that improves K-means in big data conditions. The scalability of the algorithm to big data can be easily adjusted by choosing the appropriate number of subproblems and their size. The proposed algorithm is both scalable and accurate. In our experiments it outperforms all recent state-of-the-art algorithms for the MSSC in terms of time as well as the solution quality.  ( 2 min )
    Towards PAC Multi-Object Detection and Tracking. (arXiv:2204.07482v1 [cs.CV])
    Accurately detecting and tracking multi-objects is important for safety-critical applications such as autonomous navigation. However, it remains challenging to provide guarantees on the performance of state-of-the-art techniques based on deep learning. We consider a strategy known as conformal prediction, which predicts sets of labels instead of a single label; in the classification and regression settings, these algorithms can guarantee that the true label lies within the prediction set with high probability. Building on these ideas, we propose multi-object detection and tracking algorithms that come with probably approximately correct (PAC) guarantees. They do so by constructing both a prediction set around each object detection as well as around the set of edge transitions; given an object, the detection prediction set contains its true bounding box with high probability, and the edge prediction set contains its true transition across frames with high probability. We empirically demonstrate that our method can detect and track objects with PAC guarantees on the COCO and MOT-17 datasets.
    A Reinforcement Learning Approach to Parameter Selection for Distributed Optimal Power Flow. (arXiv:2110.11991v2 [eess.SY] UPDATED)
    With the increasing penetration of distributed energy resources, distributed optimization algorithms have attracted significant attention for power systems applications due to their potential for superior scalability, privacy, and robustness to a single point-of-failure. The Alternating Direction Method of Multipliers (ADMM) is a popular distributed optimization algorithm; however, its convergence performance is highly dependent on the selection of penalty parameters, which are usually chosen heuristically. In this work, we use reinforcement learning (RL) to develop an adaptive penalty parameter selection policy for the AC optimal power flow (ACOPF) problem solved via ADMM with the goal of minimizing the number of iterations until convergence. We train our RL policy using deep Q-learning, and show that this policy can result in significantly accelerated convergence (up to a 59% reduction in the number of iterations compared to existing, curvature-informed penalty parameter selection methods). Furthermore, we show that our RL policy demonstrates promise for generalizability, performing well under unseen loading schemes as well as under unseen losses of lines and generators (up to a 50% reduction in iterations). This work thus provides a proof-of-concept for using RL for parameter selection in ADMM for power systems applications.
    Sequential Aggregation and Rematerialization: Distributed Full-batch Training of Graph Neural Networks on Large Graphs. (arXiv:2111.06483v3 [cs.LG] UPDATED)
    We present the Sequential Aggregation and Rematerialization (SAR) scheme for distributed full-batch training of Graph Neural Networks (GNNs) on large graphs. Large-scale training of GNNs has recently been dominated by sampling-based methods and methods based on non-learnable message passing. SAR on the other hand is a distributed technique that can train any GNN type directly on an entire large graph. The key innovation in SAR is the distributed sequential rematerialization scheme which sequentially re-constructs then frees pieces of the prohibitively large GNN computational graph during the backward pass. This results in excellent memory scaling behavior where the memory consumption per worker goes down linearly with the number of workers, even for densely connected graphs. Using SAR, we report the largest applications of full-batch GNN training to-date, and demonstrate large memory savings as the number of workers increases. We also present a general technique based on kernel fusion and attention-matrix rematerialization to optimize both the runtime and memory efficiency of attention-based models. We show that, coupled with SAR, our optimized attention kernels lead to significant speedups and memory savings in attention-based GNNs.We made the SAR GNN training library publicy available: \url{https://github.com/IntelLabs/SAR}.
    Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records. (arXiv:2203.06918v2 [cs.CL] UPDATED)
    Question Answering on Electronic Health Records (EHR-QA) has a significant impact on the healthcare domain, and it is being actively studied. Previous research on structured EHR-QA focuses on converting natural language queries into query language such as SQL or SPARQL (NLQ2Query), so the problem scope is limited to pre-defined data types by the specific query language. In order to expand the EHR-QA task beyond this limitation to handle multi-modal medical data and solve complex inference in the future, more primitive systemic language is needed. In this paper, we design the program-based model (NLQ2Program) for EHR-QA as the first step towards the future direction. We tackle MIMICSPARQL*, the graph-based EHR-QA dataset, via a program-based approach in a semi-supervised manner in order to overcome the absence of gold programs. Without the gold program, our proposed model shows comparable performance to the previous state-of-the-art model, which is an NLQ2Query model (0.9% gain). In addition, for a reliable EHR-QA model, we apply the uncertainty decomposition method to measure the ambiguity in the input question. We empirically confirmed data uncertainty is most indicative of the ambiguity in the input question.
    The Importance of Landscape Features for Performance Prediction of Modular CMA-ES Variants. (arXiv:2204.07431v1 [cs.NE])
    Selecting the most suitable algorithm and determining its hyperparameters for a given optimization problem is a challenging task. Accurately predicting how well a certain algorithm could solve the problem is hence desirable. Recent studies in single-objective numerical optimization show that supervised machine learning methods can predict algorithm performance using landscape features extracted from the problem instances. Existing approaches typically treat the algorithms as black-boxes, without consideration of their characteristics. To investigate in this work if a selection of landscape features that depends on algorithms properties could further improve regression accuracy, we regard the modular CMA-ES framework and estimate how much each landscape feature contributes to the best algorithm performance regression models. Exploratory data analysis performed on this data indicate that the set of most relevant features does not depend on the configuration of individual modules, but the influence that these features have on regression accuracy does. In addition, we have shown that by using classifiers that take the features relevance on the model accuracy, we are able to predict the status of individual modules in the CMA-ES configurations.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    On the Importance of Firth Bias Reduction in Few-Shot Classification. (arXiv:2110.02529v2 [cs.CV] UPDATED)
    Learning accurate classifiers for novel categories from very few examples, known as few-shot image classification, is a challenging task in statistical machine learning and computer vision. The performance in few-shot classification suffers from the bias in the estimation of classifier parameters; however, an effective underlying bias reduction technique that could alleviate this issue in training few-shot classifiers has been overlooked. In this work, we demonstrate the effectiveness of Firth bias reduction in few-shot classification. Theoretically, Firth bias reduction removes the $O(N^{-1})$ first order term from the small-sample bias of the Maximum Likelihood Estimator. Here we show that the general Firth bias reduction technique simplifies to encouraging uniform class assignment probabilities for multinomial logistic classification, and almost has the same effect in cosine classifiers. We derive an easy-to-implement optimization objective for Firth penalized multinomial logistic and cosine classifiers, which is equivalent to penalizing the cross-entropy loss with a KL-divergence between the uniform label distribution and the predictions. Then, we empirically evaluate that it is consistently effective across the board for few-shot image classification, regardless of (1) the feature representations from different backbones, (2) the number of samples per class, and (3) the number of classes. Finally, we show the robustness of Firth bias reduction, in the case of imbalanced data distribution. Our implementation is available at https://github.com/ehsansaleh/firth_bias_reduction
    Efficient Architecture Search for Diverse Tasks. (arXiv:2204.07554v1 [cs.LG])
    While neural architecture search (NAS) has enabled automated machine learning (AutoML) for well-researched areas, its application to tasks beyond computer vision is still under-explored. As less-studied domains are precisely those where we expect AutoML to have the greatest impact, in this work we study NAS for efficiently solving diverse problems. Seeking an approach that is fast, simple, and broadly applicable, we fix a standard convolutional network (CNN) topology and propose to search for the right kernel sizes and dilations its operations should take on. This dramatically expands the model's capacity to extract features at multiple resolutions for different types of data while only requiring search over the operation space. To overcome the efficiency challenges of naive weight-sharing in this search space, we introduce DASH, a differentiable NAS algorithm that computes the mixture-of-operations using the Fourier diagonalization of convolution, achieving both a better asymptotic complexity and an up-to-10x search time speedup in practice. We evaluate DASH on NAS-Bench-360, a suite of ten tasks designed for benchmarking NAS in diverse domains. DASH outperforms state-of-the-art methods in aggregate, attaining the best-known automated performance on seven tasks. Meanwhile, on six of the ten tasks, the combined search and retraining time is less than 2x slower than simply training a CNN backbone that is far less accurate.
    CryoRL: Reinforcement Learning Enables Efficient Cryo-EM Data Collection. (arXiv:2204.07543v1 [cs.LG])
    Single-particle cryo-electron microscopy (cryo-EM) has become one of the mainstream structural biology techniques because of its ability to determine high-resolution structures of dynamic bio-molecules. However, cryo-EM data acquisition remains expensive and labor-intensive, requiring substantial expertise. Structural biologists need a more efficient and objective method to collect the best data in a limited time frame. We formulate the cryo-EM data collection task as an optimization problem in this work. The goal is to maximize the total number of good images taken within a specified period. We show that reinforcement learning offers an effective way to plan cryo-EM data collection, successfully navigating heterogenous cryo-EM grids. The approach we developed, cryoRL, demonstrates better performance than average users for data collection under similar settings.
    Stretching Sentence-pair NLI Models to Reason over Long Documents and Clusters. (arXiv:2204.07447v1 [cs.CL])
    Natural Language Inference (NLI) has been extensively studied by the NLP community as a framework for estimating the semantic relation between sentence pairs. While early work identified certain biases in NLI models, recent advancements in modeling and datasets demonstrated promising performance. In this work, we further explore the direct zero-shot applicability of NLI models to real applications, beyond the sentence-pair setting they were trained on. First, we analyze the robustness of these models to longer and out-of-domain inputs. Then, we develop new aggregation methods to allow operating over full documents, reaching state-of-the-art performance on the ContractNLI dataset. Interestingly, we find NLI scores to provide strong retrieval signals, leading to more relevant evidence extractions compared to common similarity-based methods. Finally, we go further and investigate whole document clusters to identify both discrepancies and consensus among sources. In a test case, we find real inconsistencies between Wikipedia pages in different languages about the same topic.  ( 2 min )
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Deep learning model solves change point detection for multiple change types. (arXiv:2204.07403v1 [cs.LG])
    A change points detection aims to catch an abrupt disorder in data distribution. Common approaches assume that there are only two fixed distributions for data: one before and another after a change point. Real-world data are richer than this assumption. There can be multiple different distributions before and after a change. We propose an approach that works in the multiple-distributions scenario. Our approach learn representations for semi-structured data suitable for change point detection, while a common classifiers-based approach fails. Moreover, our model is more robust, when predicting change points. The datasets used for benchmarking are sequences of images with and without change points in them.  ( 2 min )
    Characterizing metastable states with the help of machine learning. (arXiv:2204.07391v1 [physics.comp-ph])
    Present-day atomistic simulations generate long trajectories of ever more complex systems. Analyzing these data, discovering metastable states, and uncovering their nature is becoming increasingly challenging. In this paper, we first use the variational approach to conformation dynamics to discover the slowest dynamical modes of the simulations. This allows the different metastable states of the system to be located and organized hierarchically. The physical descriptors that characterize metastable states are discovered by means of a machine learning method. We show in the cases of two proteins, Chignolin and Bovine Pancreatic Trypsin Inhibitor, how such analysis can be effortlessly performed in a matter of seconds. Another strength of our approach is that it can be applied to the analysis of both unbiased and biased simulations.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Theory-inspired Parameter Control Benchmarks for Dynamic Algorithm Configuration. (arXiv:2202.03259v2 [cs.NE] UPDATED)
    It has long been observed that the performance of evolutionary algorithms and other randomized search heuristics can benefit from a non-static choice of the parameters that steer their optimization behavior. Mechanisms that identify suitable configurations on the fly ("parameter control") or via a dedicated training process ("dynamic algorithm configuration") are therefore an important component of modern evolutionary computation frameworks. Several approaches to address the dynamic parameter setting problem exist, but we barely understand which ones to prefer for which applications. As in classical benchmarking, problem collections with a known ground truth can offer very meaningful insights in this context. Unfortunately, settings with well-understood control policies are very rare. One of the few exceptions for which we know which parameter settings minimize the expected runtime is the LeadingOnes problem. We extend this benchmark by analyzing optimal control policies that can select the parameters only from a given portfolio of possible values. This also allows us to compute optimal parameter portfolios of a given size. We demonstrate the usefulness of our benchmarks by analyzing the behavior of the DDQN reinforcement learning approach for dynamic algorithm configuration.
    Simple but Effective: CLIP Embeddings for Embodied AI. (arXiv:2111.09888v2 [cs.CV] UPDATED)
    Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation. We investigate the effectiveness of CLIP visual backbones for Embodied AI tasks. We build incredibly simple baselines, named EmbCLIP, with no task specific architectures, inductive biases (such as the use of semantic maps), auxiliary tasks during training, or depth maps -- yet we find that our improved baselines perform very well across a range of tasks and simulators. EmbCLIP tops the RoboTHOR ObjectNav leaderboard by a huge margin of 20 pts (Success Rate). It tops the iTHOR 1-Phase Rearrangement leaderboard, beating the next best submission, which employs Active Neural Mapping, and more than doubling the % Fixed Strict metric (0.08 to 0.17). It also beats the winners of the 2021 Habitat ObjectNav Challenge, which employ auxiliary tasks, depth maps, and human demonstrations, and those of the 2019 Habitat PointNav Challenge. We evaluate the ability of CLIP's visual representations at capturing semantic information about input observations -- primitives that are useful for navigation-heavy embodied tasks -- and find that CLIP's representations encode these primitives more effectively than ImageNet-pretrained backbones. Finally, we extend one of our baselines, producing an agent capable of zero-shot object navigation that can navigate to objects that were not used as targets during training. Our code and models are available at https://github.com/allenai/embodied-clip  ( 2 min )
    Nanorobot queue: Cooperative treatment of cancer based on team member communication and image processing. (arXiv:2111.11236v3 [cs.RO] UPDATED)
    Although nanorobots have been used as clinical prescriptions for work such as gastroscopy, and even photoacoustic tomography technology has been proposed to control nanorobots to deliver drugs at designated delivery points in real time, and there are cases of eliminating "superbacteria" in blood through nanorobots, most technologies are immature, either with low efficiency or low accuracy, Either it can not be mass produced, so the most effective way to treat cancer diseases at this stage is through chemotherapy and radiotherapy. Patients are suffering and can not be cured. Therefore, this paper proposes an ideal model of a treatment method that can completely cure cancer, a cooperative treatment method based on nano robot queue through team member communication and computer vision image classification (target detection).
    Grassmannian Optimization for Online Tensor Completion and Tracking with the t-SVD. (arXiv:2001.11419v4 [eess.SP] UPDATED)
    We propose a new fast streaming algorithm for the tensor completion problem of imputing missing entries of a low-tubal-rank tensor using the tensor singular value decomposition (t-SVD) algebraic framework. We show the t-SVD is a specialization of the well-studied block-term decomposition for third-order tensors, and we present an algorithm under this model that can track changing free submodules from incomplete streaming 2-D data. The proposed algorithm uses principles from incremental gradient descent on the Grassmann manifold of subspaces to solve the tensor completion problem with linear complexity and constant memory in the number of time samples. We provide a local expected linear convergence result for our algorithm. Our empirical results are competitive in accuracy but much faster in compute time than state-of-the-art tensor completion algorithms on real applications to recover temporal chemo-sensing and MRI data under limited sampling.
    An interpretable machine learning approach for ferroalloys consumptions. (arXiv:2204.07421v1 [cs.LG])
    This paper is devoted to a practical method for ferroalloys consumption modeling and optimization. We consider the problem of selecting the optimal process control parameters based on the analysis of historical data from sensors. We developed approach, which predicts results of chemical reactions and give ferroalloys consumption recommendation. The main features of our method are easy interpretation and noise resistance. Our approach is based on k-means clustering algorithm, decision trees and linear regression. The main idea of the method is to identify situations where processes go similarly. For this, we propose using a k-means based dataset clustering algorithm and a classification algorithm to determine the cluster. This algorithm can be also applied to various technological processes, in this article, we demonstrate its application in metallurgy. To test the application of the proposed method, we used it to optimize ferroalloys consumption in Basic Oxygen Furnace steelmaking when finishing steel in a ladle furnace. The minimum required element content for a given steel grade was selected as the predictive model's target variable, and the required amount of the element to be added to the melt as the optimized variable. Keywords: Clustering, Machine Learning, Linear Regression, Steelmaking, Optimization, Gradient Boosting, Artificial Intelligence, Decision Trees, Recommendation services
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
    SuperCone: Unified User Segmentation over Heterogeneous Experts via Concept Meta-learning. (arXiv:2203.07029v2 [cs.LG] UPDATED)
    We study the problem of user segmentation: given a set of users and one or more predefined groups or segments, assign users to their corresponding segments. As an example, for a segment indicating particular interest in a certain area of sports or entertainment, the task will be to predict whether each single user will belong to the segment. However, there may exist numerous long tail prediction tasks that suffer from data availability and may be of heterogeneous nature, which make it hard to capture using single off the shelf model architectures. In this work, we present SuperCone, our unified predicative segments system that addresses the above challenges. It builds on top of a flat concept representation that summarizes each user's heterogeneous digital footprints, and uniformly models each of the prediction task using an approach called "super learning ", that is, combining prediction models with diverse architectures or learning method that are not compatible with each other. Following this, we provide an end to end approach that learns to flexibly attend to best suited heterogeneous experts adaptively, while at the same time incorporating deep representations of the input concepts that augments the above experts. Experiments show that SuperCone significantly outperform state-of-the-art recommendation and ranking algorithms on a wide range of predicative segment tasks and public structured data learning benchmarks.
    Safe Reinforcement Learning Using Black-Box Reachability Analysis. (arXiv:2204.07417v1 [cs.RO])
    Reinforcement learning (RL) is capable of sophisticated motion planning and control for robots in uncertain environments. However, state-of-the-art deep RL approaches typically lack safety guarantees, especially when the robot and environment models are unknown. To justify widespread deployment, robots must respect safety constraints without sacrificing performance. Thus, we propose a Black-box Reachability-based Safety Layer (BRSL) with three main components: (1) data-driven reachability analysis for a black-box robot model, (2) a trajectory rollout planner that predicts future actions and observations using an ensemble of neural networks trained online, and (3) a differentiable polytope collision check between the reachable set and obstacles that enables correcting unsafe actions. In simulation, BRSL outperforms other state-of-the-art safe RL methods on a Turtlebot 3, a quadrotor, and a trajectory-tracking point mass with an unsafe set adjacent to the area of highest reward.
    Model-Based Deep Learning of Joint Probabilistic and Geometric Shaping for Optical Communication. (arXiv:2204.07457v1 [eess.SP])
    Autoencoder-based deep learning is applied to jointly optimize geometric and probabilistic constellation shaping for optical coherent communication. The optimized constellation shaping outperforms the 256 QAM Maxwell-Boltzmann probabilistic distribution with extra 0.05 bits/4D-symbol mutual information for 64 GBd transmission over 170 km SMF link.
    A Machine Learning Tutorial for Operational Meteorology, Part I: Traditional Machine Learning. (arXiv:2204.07492v1 [physics.ao-ph])
    Recently, the use of machine learning in meteorology has increased greatly. While many machine learning methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning methods are 'black boxes' and thus end-users are hesitant to apply the machine learning methods in their every day workflow. To reduce the opaqueness of machine learning methods and lower hesitancy towards machine learning in meteorology, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological example is used to contextualize the machine learning methods while also discussing machine learning topics using plain language. The following machine learning methods are demonstrated: linear regression; logistic regression; decision trees; random forest; gradient boosted decision trees; naive Bayes; and support vector machines. Beyond discussing the different methods, the paper also contains discussions on the general machine learning process as well as best practices to enable readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyse the use of machine learning in meteorology.
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.
    Super Resolution for Turbulent Flows in 2D: Stabilized Physics Informed Neural Networks. (arXiv:2204.07413v1 [math.NA])
    We propose a new design of a neural network for solving a zero shot super resolution problem for turbulent flows. We embed Luenberger-type observer into the network's architecture to inform the network of the physics of the process, and to provide error correction and stabilization mechanisms. In addition, to compensate for decrease of observer's performance due to the presence of unknown destabilizing forcing, the network is designed to estimate the contribution of the unknown forcing implicitly from the data over the course of training. By running a set of numerical experiments, we demonstrate that the proposed network does recover unknown forcing from data and is capable of predicting turbulent flows in high resolution from low resolution noisy observations.  ( 2 min )
    Invariance Through Inference. (arXiv:2112.08526v2 [cs.LG] UPDATED)
    We introduce a general approach, called Invariance through Inference, for improving the test-time performance of an agent in deployment environments with unknown perceptual variations. Instead of producing invariant visual features through interpolation, invariance through inference turns adaptation at deployment-time into an unsupervised learning problem. This is achieved in practice by deploying a straightforward algorithm that tries to match the distribution of latent features to the agent's prior experience, without relying on paired data. Although simple, we show that this idea leads to surprising improvements on a variety of adaptation scenarios without access to deployment-time rewards, including changes in scene content, camera poses, and lighting conditions. We present results on challenging domains including distractor control suite and sim-to-real transfer for image-based robot manipulation.
    Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. (arXiv:2204.05255v2 [cs.CR] UPDATED)
    Backdoor attacks insert malicious data into a training set so that, during inference time, it misclassifies inputs that have been patched with a backdoor trigger as the malware specified label. For backdoor attacks to bypass human inspection, it is essential that the injected data appear to be correctly labeled. The attacks with such property are often referred to as "clean-label attacks." Existing clean-label backdoor attacks require knowledge of the entire training set to be effective. Obtaining such knowledge is difficult or impossible because training data are often gathered from multiple sources (e.g., face images from different users). It remains a question whether backdoor attacks still present a real threat. This paper provides an affirmative answer to this question by designing an algorithm to mount clean-label backdoor attacks based only on the knowledge of representative examples from the target class. With poisoning equal to or less than 0.5% of the target-class data and 0.05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger. Our attack works well across datasets and models, even when the trigger presents in the physical world. We explore the space of defenses and find that, surprisingly, our attack can evade the latest state-of-the-art defenses in their vanilla form, or after a simple twist, we can adapt to the downstream defenses. We study the cause of the intriguing effectiveness and find that because the trigger synthesized by our attack contains features as persistent as the original semantic features of the target class, any attempt to remove such triggers would inevitably hurt the model accuracy first.
    Experimentally realized memristive memory augmented neural network. (arXiv:2204.07429v1 [cs.ET])
    Lifelong on-device learning is a key challenge for machine intelligence, and this requires learning from few, often single, samples. Memory augmented neural network has been proposed to achieve the goal, but the memory module has to be stored in an off-chip memory due to its size. Therefore the practical use has been heavily limited. Previous works on emerging memory-based implementation have difficulties in scaling up because different modules with various structures are difficult to integrate on the same chip and the small sense margin of the content addressable memory for the memory module heavily limited the degree of mismatch calculation. In this work, we implement the entire memory augmented neural network architecture in a fully integrated memristive crossbar platform and achieve an accuracy that closely matches standard software on digital hardware for the Omniglot dataset. The successful demonstration is supported by implementing new functions in crossbars in addition to widely reported matrix multiplications. For example, the locality-sensitive hashing operation is implemented in crossbar arrays by exploiting the intrinsic stochasticity of memristor devices. Besides, the content-addressable memory module is realized in crossbars, which also supports the degree of mismatches. Simulations based on experimentally validated models show such an implementation can be efficiently scaled up for one-shot learning on the Mini-ImageNet dataset. The successful demonstration paves the way for practical on-device lifelong learning and opens possibilities for novel attention-based algorithms not possible in conventional hardware.
    Rethinking Machine Learning Model Evaluation in Pathology. (arXiv:2204.05205v2 [eess.IV] UPDATED)
    Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for natural images are ill-equipped to deal with pathology images that are significantly large and noisy, require expensive labeling, are hard to interpret, and are susceptible to spurious correlations. We propose a set of practical guidelines for ML evaluation in pathology that address the above concerns. The paper includes measures for setting up the evaluation framework, effectively dealing with variability in labels, and a recommended suite of tests to address issues related to domain shift, robustness, and confounding variables. We hope that the proposed framework will bridge the gap between ML researchers and domain experts, leading to wider adoption of ML techniques in pathology and improving patient outcomes.
    Synthesizing Informative Training Samples with GAN. (arXiv:2204.07513v1 [cs.LG])
    Remarkable progress has been achieved in synthesizing photo-realistic images with generative adversarial neural networks (GANs). Recently, GANs are utilized as the training sample generator when obtaining or storing real training data is expensive even infeasible. However, traditional GANs generated images are not as informative as the real training samples when being used to train deep neural networks. In this paper, we propose a novel method to synthesize Informative Training samples with GAN (IT-GAN). Specifically, we freeze a pre-trained GAN model and learn the informative latent vectors that corresponds to informative training samples. The synthesized images are required to preserve information for training deep neural networks rather than visual reality or fidelity. Experiments verify that the deep neural networks can learn faster and achieve better performance when being trained with our IT-GAN generated images. We also show that our method is a promising solution to dataset condensation problem.
    Streaming Align-Refine for Non-autoregressive Deliberation. (arXiv:2204.07556v1 [cs.CL])
    We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model. Our algorithm facilitates a simple greedy decoding procedure, and at the same time is capable of producing the decoding result at each frame with limited right context, thus enjoying both high efficiency and low latency. These advantages are achieved by converting the offline Align-Refine algorithm to be streaming-compatible, with a novel transformer decoder architecture that performs local self-attentions for both text and audio, and a time-aligned cross-attention at each layer. Furthermore, we perform discriminative training of our model with the minimum word error rate (MWER) criterion, which has not been done in the non-AR decoding literature. Experiments on voice search datasets and Librispeech show that with reasonable right context, our streaming model performs as well as the offline counterpart, and discriminative training leads to further WER gain when the first-pass model has small capacity.
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    INSTA-BNN: Binary Neural Network with INSTAnce-aware Threshold. (arXiv:2204.07439v1 [cs.CV])
    Binary Neural Networks (BNNs) have emerged as a promising solution for reducing the memory footprint and compute costs of deep neural networks. BNNs, on the other hand, suffer from information loss because binary activations are limited to only two values, resulting in reduced accuracy. To improve the accuracy, previous studies have attempted to control the distribution of binary activation by manually shifting the threshold of the activation function or making the shift amount trainable. During the process, they usually depended on statistical information computed from a batch. We argue that using statistical data from a batch fails to capture the crucial information for each input instance in BNN computations, and the differences between statistical information computed from each instance need to be considered when determining the binary activation threshold of each instance. Based on the concept, we propose the Binary Neural Network with INSTAnce-aware threshold (INSTA-BNN), which decides the activation threshold value considering the difference between statistical data computed from a batch and each instance. The proposed INSTA-BNN outperforms the baseline by 2.5% and 2.3% on the ImageNet classification task with comparable computing cost, achieving 68.0% and 71.7% top-1 accuracy on ResNet-18 and MobileNetV1 based models, respectively.
    Neural Structured Prediction for Inductive Node Classification. (arXiv:2204.07524v1 [cs.LG])
    This paper studies node classification in the inductive setting, i.e., aiming to learn a model on labeled training graphs and generalize it to infer node labels on unlabeled test graphs. This problem has been extensively studied with graph neural networks (GNNs) by learning effective node representations, as well as traditional structured prediction methods for modeling the structured output of node labels, e.g., conditional random fields (CRFs). In this paper, we present a new approach called the Structured Proxy Network (SPN), which combines the advantages of both worlds. SPN defines flexible potential functions of CRFs with GNNs. However, learning such a model is nontrivial as it involves optimizing a maximin game with high-cost inference. Inspired by the underlying connection between joint and marginal distributions defined by Markov networks, we propose to solve an approximate version of the optimization problem as a proxy, which yields a near-optimal solution, making learning more efficient. Extensive experiments on two settings show that our approach outperforms many competitive baselines.
    Selecting Continuous Life-Like Cellular Automata for Halting Unpredictability: Evolving for Abiogenesis. (arXiv:2204.07541v1 [cs.NE])
    Substantial efforts have been applied to engineer CA with desired emergent properties, such as supporting gliders. Recent work in continuous CA has generated a wide variety of compelling bioreminescent patterns, and the expansion of CA research into continuous numbers, multiple channels, and higher dimensions complicates their study. In this work we devise a strategy for evolving CA and CA patterns in two steps, based on the simple idea that CA are likely to be complex and computationally capable if they support patterns that grow indefinitely as well as patterns that vanish completely, and are difficult to predict the difference in advance. The second part of our strategy evolves patterns by selecting for mobility and conservation of mean cell value. We validate our pattern evolution method by re-discovering gliders in 17 of 17 Lenia CA, and also report 5 new evolved CA that support evolved glider patterns, differing from previously reported Lenia patterns. The CA reported here share neighborhood kernels with previously described Lenia CA, but exhibit a wider range of typical dynamics than their Lenia counterparts. Code for evolving continuous CA is made available under an MIT License.
    Deep Learning-based List Sphere Decoding for Faster-than-Nyquist (FTN) Signaling Detection. (arXiv:2204.07569v1 [cs.IT])
    Faster-than-Nyquist (FTN) signaling is a candidate non-orthonormal transmission technique to improve the spectral efficiency (SE) of future communication systems. However, such improvements of the SE are at the cost of additional computational complexity to remove the intentionally introduced intersymbol interference. In this paper, we investigate the use of deep learning (DL) to reduce the detection complexity of FTN signaling. To eliminate the need of having a noise whitening filter at the receiver, we first present an equivalent FTN signaling model based on using a set of orthonormal basis functions and identify its operation region. Second, we propose a DL-based list sphere decoding (DL-LSD) algorithm that selects and updates the initial radius of the original LSD to guarantee a pre-defined number $N_{\text{L}}$ of lattice points inside the hypersphere. This is achieved by training a neural network to output an approximate initial radius that includes $N_{\text{L}}$ lattice points. At the testing phase, if the hypersphere has more than $N_{\text{L}}$ lattice points, we keep the $N_{\text{L}}$ closest points to the point corresponding to the received FTN signal; however, if the hypersphere has less than $N_{\text{L}}$ points, we increase the approximate initial radius by a value that depends on the standard deviation of the distribution of the output radii from the training phase. Then, the approximate value of the log-likelihood ratio (LLR) is calculated based on the obtained $N_{\text{L}}$ points. Simulation results show that the computational complexity of the proposed DL-LSD is lower than its counterpart of the original LSD by orders of magnitude.
    Accurate ADMET Prediction with XGBoost. (arXiv:2204.07532v1 [q-bio.BM])
    The absorption, distribution, metabolism, excretion, and toxicity (ADMET) properties are important in drug discovery as they define efficacy and safety. Here, we apply an ensemble of features, including fingerprints and descriptors, and a tree-based machine learning model, extreme gradient boosting, for accurate ADMET prediction. Our model performs well in the Therapeutics Data Commons ADMET benchmark group. For 22 tasks, our model is ranked first in 10 tasks and top 3 in 18 tasks.  ( 2 min )
    Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model. (arXiv:2111.00009v2 [eess.AS] UPDATED)
    In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.  ( 2 min )
    Barwise Compression Schemes for Audio-Based Music Structure Analysis. (arXiv:2202.04981v2 [cs.SD] UPDATED)
    Music Structure Analysis (MSA) consists in segmenting a music piece in several distinct sections. We approach MSA within a compression framework, under the hypothesis that the structure is more easily revealed by a simplified representation of the original content of the song. More specifically, under the hypothesis that MSA is correlated with similarities occurring at the bar scale, this article introduces the use of linear and non-linear compression schemes on barwise audio signals. Compressed representations capture the most salient components of the different bars in the song and are then used to infer the song structure using a dynamic programming algorithm. This work explores both low-rank approximation models such as Principal Component Analysis or Nonnegative Matrix Factorization and "piece-specific" Auto-Encoding Neural Networks, with the objective to learn latent representations specific to a given song. Such approaches do not rely on supervision nor annotations, which are well-known to be tedious to collect and possibly ambiguous in MSA description. In our experiments, several unsupervised compression schemes achieve a level of performance comparable to that of state-of-the-art supervised methods (for 3s tolerance) on the RWC-Pop dataset, showcasing the importance of the barwise compression processing for MSA.  ( 2 min )
    GitTables: A Large-Scale Corpus of Relational Tables. (arXiv:2106.07258v4 [cs.DB] UPDATED)
    The success of deep learning has sparked interest in improving relational table tasks, like data preparation and search, with table representation models trained on large table corpora. Existing table corpora primarily contain tables extracted from HTML pages, limiting the capability to represent offline database tables. To train and evaluate high-capacity models for applications beyond the Web, we need resources with tables that resemble relational database tables. Here we introduce GitTables, a corpus of 1M relational tables extracted from GitHub. Our continuing curation aims at growing the corpus to at least 10M tables. Analyses of GitTables show that its structure, content, and topical coverage differ significantly from existing table corpora. We annotate table columns in GitTables with semantic types, hierarchical relations and descriptions from Schema.org and DBpedia. The evaluation of our annotation pipeline on the T2Dv2 benchmark illustrates that our approach provides results on par with human annotations. We present three applications of GitTables, demonstrating its value for learned semantic type detection models, schema completion methods, and benchmarks for table-to-KG matching, data search, and preparation. We make the corpus and code available at https://gittables.github.io.  ( 2 min )
    NICE: Robust Scheduling through Reinforcement Learning-Guided Integer Programming. (arXiv:2109.12171v3 [cs.LG] UPDATED)
    Integer programs provide a powerful abstraction for representing a wide range of real-world scheduling problems. Despite their ability to model general scheduling problems, solving large-scale integer programs (IP) remains a computational challenge in practice. The incorporation of more complex objectives such as robustness to disruptions further exacerbates the computational challenge. We present NICE (Neural network IP Coefficient Extraction), a novel technique that combines reinforcement learning and integer programming to tackle the problem of robust scheduling. More specifically, NICE uses reinforcement learning to approximately represent complex objectives in an integer programming formulation. We use NICE to determine assignments of pilots to a flight crew schedule so as to reduce the impact of disruptions. We compare NICE with (1) a baseline integer programming formulation that produces a feasible crew schedule, and (2) a robust integer programming formulation that explicitly tries to minimize the impact of disruptions. Our experiments show that, across a variety of scenarios, NICE produces schedules resulting in 33% to 48% fewer disruptions than the baseline formulation. Moreover, in more severely constrained scheduling scenarios in which the robust integer program fails to produce a schedule within 90 minutes, NICE is able to build robust schedules in less than 2 seconds on average.  ( 2 min )
    Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning. (arXiv:2202.03666v2 [cs.LG] UPDATED)
    Consider the problem of training robustly capable agents. One approach is to generate a diverse collection of agent polices. Training can then be viewed as a quality diversity (QD) optimization problem, where we search for a collection of performant policies that are diverse with respect to quantified behavior. Recent work shows that differentiable quality diversity (DQD) algorithms greatly accelerate QD optimization when exact gradients are available. However, agent policies typically assume that the environment is not differentiable. To apply DQD algorithms to training agent policies, we must approximate gradients for performance and behavior. We propose two variants of the current state-of-the-art DQD algorithm that compute gradients via approximation methods common in reinforcement learning (RL). We evaluate our approach on four simulated locomotion tasks. One variant achieves results comparable to the current state-of-the-art in combining QD and RL, while the other performs comparably in two locomotion tasks. These results provide insight into the limitations of current DQD algorithms in domains where gradients must be approximated. Source code is available at https://github.com/icaros-usc/dqd-rl
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.
    Sparsifying the Update Step in Graph Neural Networks. (arXiv:2109.00909v3 [cs.LG] UPDATED)
    Message-Passing Neural Networks (MPNNs), the most prominent Graph Neural Network (GNN) framework, celebrate much success in the analysis of graph-structured data. Concurrently, the sparsification of Neural Network models attracts a great amount of academic and industrial interest. In this paper we conduct a structured, empirical study of the effect of sparsification on the trainable part of MPNNs known as the Update step. To this end, we design a series of models to successively sparsify the linear transform in the Update step. Specifically, we propose the ExpanderGNN model with a tuneable sparsification rate and the Activation-Only GNN, which has no linear transform in the Update step. In agreement with a growing trend in the literature the sparsification paradigm is changed by initialising sparse neural network architectures rather than expensively sparsifying already trained architectures. Our novel benchmark models enable a better understanding of the influence of the Update step on model performance and outperform existing simplified benchmark models such as the Simple Graph Convolution. The ExpanderGNNs, and in some cases the Activation-Only models, achieve performance on par with their vanilla counterparts on several downstream tasks, while containing significantly fewer trainable parameters. Our code is publicly available at: https://github.com/ChangminWu/ExpanderGNN.
    Weakly-supervised Temporal Path Representation Learning with Contrastive Curriculum Learning -- Extended Version. (arXiv:2203.16110v3 [cs.LG] UPDATED)
    In step with the digitalization of transportation, we are witnessing a growing range of path-based smart-city applications, e.g., travel-time estimation and travel path ranking. A temporal path(TP) that includes temporal information, e.g., departure time, into the path is fundamental to enable such applications. In this setting, it is essential to learn generic temporal path representations(TPRs) that consider spatial and temporal correlations simultaneously and that can be used in different applications, i.e., downstream tasks. Existing methods fail to achieve the goal since (i) supervised methods require large amounts of task-specific labels when training and thus fail to generalize the obtained TPRs to other tasks; (ii) through unsupervised methods can learn generic representations, they disregard the temporal aspect, leading to sub-optimal results. To contend with the limitations of existing solutions, we propose a Weakly-Supervised Contrastive (WSC) learning model. We first propose a temporal path encoder that encodes both the spatial and temporal information of a temporal path into a TPR. To train the encoder, we introduce weak labels that are easy and inexpensive to obtain and are relevant to different tasks, e.g., temporal labels indicating peak vs. off-peak hours from departure times. Based on the weak labels, we construct meaningful positive and negative temporal path samples by considering both spatial and temporal information, which facilities training the encoder using contrastive learning by pulling closer to the positive samples' representations while pushing away the negative samples' representations. To better guide contrastive learning, we propose a learning strategy based on Curriculum Learning such that the learning performs from easy to hard training instances. Experiments studies verify the effectiveness of the proposed method.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v2 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces the following challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper introduces a new technique called model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation on the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.  ( 2 min )
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    The Distributed Information Bottleneck reveals the explanatory structure of complex systems. (arXiv:2204.07576v1 [cs.LG])
    The fruits of science are relationships made comprehensible, often by way of approximation. While deep learning is an extremely powerful way to find relationships in data, its use in science has been hindered by the difficulty of understanding the learned relationships. The Information Bottleneck (IB) is an information theoretic framework for understanding a relationship between an input and an output in terms of a trade-off between the fidelity and complexity of approximations to the relationship. Here we show that a crucial modification -- distributing bottlenecks across multiple components of the input -- opens fundamentally new avenues for interpretable deep learning in science. The Distributed Information Bottleneck throttles the downstream complexity of interactions between the components of the input, deconstructing a relationship into meaningful approximations found through deep learning without requiring custom-made datasets or neural network architectures. Applied to a complex system, the approximations illuminate aspects of the system's nature by restricting -- and monitoring -- the information about different components incorporated into the approximation. We demonstrate the Distributed IB's explanatory utility in systems drawn from applied mathematics and condensed matter physics. In the former, we deconstruct a Boolean circuit into approximations that isolate the most informative subsets of input components without requiring exhaustive search. In the latter, we localize information about future plastic rearrangement in the static structure of a sheared glass, and find the information to be more or less diffuse depending on the system's preparation. By way of a principled scheme of approximations, the Distributed IB brings much-needed interpretability to deep learning and enables unprecedented analysis of information flow through a system.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Transferability Properties of Graph Neural Networks. (arXiv:2112.04629v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) are composed of layers consisting of graph convolutions and pointwise nonlinearities. Due to their invariance and stability properties, GNNs are provably successful at learning representations from data supported on moderate-scale graphs. However, they are difficult to learn on large-scale graphs. In this paper, we study the problem of training GNNs on graphs of moderate size and transferring them to large-scale graphs. We use graph limits called graphons to define limit objects for graph filters and GNNs -- graphon filters and graphon neural networks (WNNs) -- which we interpret as generative models for graph filters and GNNs. We then show that graphon filters and WNNs can be approximated by graph filters and GNNs sampled from them on weighted and stochastic graphs. Because the error of these approximations can be upper bounded, by a triangle inequality argument we can further bound the error of transferring a graph filter or a GNN across graphs. Our results show that (i) the transference error decreases with the graph size, and (ii) that graph filters have a transferability-discriminability tradeoff that in GNNs is alleviated by the scattering behavior of the nonlinearity. These findings are demonstrated empirically in a movie recommendation problem and in a decentralized control task.
    Kernel similarity matching with Hebbian neural networks. (arXiv:2204.07475v1 [cs.NE])
    Recent works have derived neural networks with online correlation-based learning rules to perform \textit{kernel similarity matching}. These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological "Nystr{\"o}m" method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nystr{\"o}m methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.
    Prototype-based Domain Generalization Framework for Subject-Independent Brain-Computer Interfaces. (arXiv:2204.07358v1 [eess.SP])
    Brain-computer interface (BCI) is challenging to use in practice due to the inter/intra-subject variability of electroencephalography (EEG). The BCI system, in general, necessitates a calibration technique to obtain subject/session-specific data in order to tune the model each time the system is utilized. This issue is acknowledged as a key hindrance to BCI, and a new strategy based on domain generalization has recently evolved to address it. In light of this, we've concentrated on developing an EEG classification framework that can be applied directly to data from unknown domains (i.e. subjects), using only data acquired from separate subjects previously. For this purpose, in this paper, we proposed a framework that employs the open-set recognition technique as an auxiliary task to learn subject-specific style features from the source dataset while helping the shared feature extractor with mapping the features of the unseen target dataset as a new unseen domain. Our aim is to impose cross-instance style in-variance in the same domain and reduce the open space risk on the potential unseen subject in order to improve the generalization ability of the shared feature extractor. Our experiments showed that using the domain information as an auxiliary network increases the generalization performance.  ( 2 min )
    End-to-End Sensitivity-Based Filter Pruning. (arXiv:2204.07412v1 [cs.CV])
    In this paper, we present a novel sensitivity-based filter pruning algorithm (SbF-Pruner) to learn the importance scores of filters of each layer end-to-end. Our method learns the scores from the filter weights, enabling it to account for the correlations between the filters of each layer. Moreover, by training the pruning scores of all layers simultaneously our method can account for layer interdependencies, which is essential to find a performant sparse sub-network. Our proposed method can train and generate a pruned network from scratch in a straightforward, one-stage training process without requiring a pretrained network. Ultimately, we do not need layer-specific hyperparameters and pre-defined layer budgets, since SbF-Pruner can implicitly determine the appropriate number of channels in each layer. Our experimental results on different network architectures suggest that SbF-Pruner outperforms advanced pruning methods. Notably, on CIFAR-10, without requiring a pretrained baseline network, we obtain 1.02% and 1.19% accuracy gain on ResNet56 and ResNet110, compared to the baseline reported for state-of-the-art pruning algorithms. This is while SbF-Pruner reduces parameter-count by 52.3% (for ResNet56) and 54% (for ResNet101), which is better than the state-of-the-art pruning algorithms with a high margin of 9.5% and 6.6%.  ( 2 min )
    Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method. (arXiv:2204.07390v1 [cs.CL])
    Email is one of the most widely used ways to communicate, with millions of people and businesses relying on it to communicate and share knowledge and information on a daily basis. Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. Processing and managing emails properly for individuals and companies are getting increasingly difficult. This article proposes a novel technique for email spam detection that is based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms. During system training, the network is selectively focused on necessary parts of the email text. The usage of convolution layers to extract more meaningful, abstract, and generalizable features by hierarchical representation is the major contribution of this study. Additionally, this contribution incorporates cross-dataset evaluation, which enables the generation of more independent performance results from the model's training dataset. According to cross-dataset evaluation results, the proposed technique advances the results of the present attention-based techniques by utilizing temporal convolutions, which give us more flexible receptive field sizes are utilized. The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.  ( 2 min )
    Towards Building a Personalized Dialogue Generator via Implicit User Persona Detection. (arXiv:2204.07372v1 [cs.CL])
    Current works in the generation of personalized dialogue primarily contribute to the agent avoiding contradictory persona and driving the response more informative. However, we found that the generated responses from these models are mostly self-centered with little care for the other party since they ignore the user's persona. Moreover, we consider high-quality transmission is essentially built based on apprehending the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting implicit user persona. Because it's difficult to collect a large number of personas for each user, we attempt to model the user's potential persona and its representation from the dialogue absence of any external information. Perception variable and fader variable are conceived utilizing Conditional Variational Inference. The two latent variables simulate the process of people being aware of the other party's persona and producing the corresponding expression in conversation. Finally, Posterior-discriminated Regularization is presented to enhance the training procedure. Empirical studies demonstrate that compared with the state-of-the-art methods, ours is more concerned with the user's persona and outperforms in evaluations.  ( 2 min )
    Anomalous Sound Detection Based on Machine Activity Detection. (arXiv:2204.07353v1 [eess.AS])
    We have developed an unsupervised anomalous sound detection method for machine condition monitoring that utilizes an auxiliary task -- detecting when the target machine is active. First, we train a model that detects machine activity by using normal data with machine activity labels and then use the activity-detection error as the anomaly score for a given sound clip if we have access to the ground-truth activity labels in the inference phase. If these labels are not available, the anomaly score is calculated through outlier detection on the embedding vectors obtained by the activity-detection model. Solving this auxiliary task enables the model to learn the difference between the target machine sounds and similar background noise, which makes it possible to identify small deviations in the target sounds. Experimental results showed that the proposed method improves the anomaly-detection performance of the conventional method complementarily by means of an ensemble.  ( 2 min )
    SSR-HEF: Crowd Counting with Multi-Scale Semantic Refining and Hard Example Focusing. (arXiv:2204.07406v1 [cs.CV])
    Crowd counting based on density maps is generally regarded as a regression task.Deep learning is used to learn the mapping between image content and crowd density distribution. Although great success has been achieved, some pedestrians far away from the camera are difficult to be detected. And the number of hard examples is often larger. Existing methods with simple Euclidean distance algorithm indiscriminately optimize the hard and easy examples so that the densities of hard examples are usually incorrectly predicted to be lower or even zero, which results in large counting errors. To address this problem, we are the first to propose the Hard Example Focusing(HEF) algorithm for the regression task of crowd counting. The HEF algorithm makes our model rapidly focus on hard examples by attenuating the contribution of easy examples.Then higher importance will be given to the hard examples with wrong estimations. Moreover, the scale variations in crowd scenes are large, and the scale annotations are labor-intensive and expensive. By proposing a multi-Scale Semantic Refining (SSR) strategy, lower layers of our model can break through the limitation of deep learning to capture semantic features of different scales to sufficiently deal with the scale variation. We perform extensive experiments on six benchmark datasets to verify the proposed method. Results indicate the superiority of our proposed method over the state-of-the-art methods. Moreover, our designed model is smaller and faster.  ( 2 min )
    Crowd counting with segmentation attention convolutional neural network. (arXiv:2204.07380v1 [cs.CV])
    Deep learning occupies an undisputed dominance in crowd counting. In this paper, we propose a novel convolutional neural network (CNN) architecture called SegCrowdNet. Despite the complex background in crowd scenes, the proposeSegCrowdNet still adaptively highlights the human head region and suppresses the non-head region by segmentation. With the guidance of an attention mechanism, the proposed SegCrowdNet pays more attention to the human head region and automatically encodes the highly refined density map. The crowd count can be obtained by integrating the density map. To adapt the variation of crowd counts, SegCrowdNet intelligently classifies the crowd count of each image into several groups. In addition, the multi-scale features are learned and extracted in the proposed SegCrowdNet to overcome the scale variations of the crowd. To verify the effectiveness of our proposed method, extensive experiments are conducted on four challenging datasets. The results demonstrate that our proposed SegCrowdNet achieves excellent performance compared with the state-of-the-art methods.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network. (arXiv:2204.07360v1 [eess.SP])
    This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) signals present spatial-temporal-frequency (STF) diversity decided by both the electromagnetic radiation shape and motion pattern of the aircraft. Then a STF graph attention convolutional network (STFGACN) is developed to distill semantic features from the RCS signals received by the heterogeneous radar network. Extensive experiment results verify that the STFGACN outperforms the baseline methods in terms of detection accuracy, and ablation experiments are carried out to further show that the expansion of the information dimension can gain considerable benefits to perform robustly in the low signal-to-noise ratio region.  ( 2 min )
    Structural Analysis of Branch-and-Cut and the Learnability of Gomory Mixed Integer Cuts. (arXiv:2204.07312v1 [math.OC])
    The incorporation of cutting planes within the branch-and-bound algorithm, known as branch-and-cut, forms the backbone of modern integer programming solvers. These solvers are the foremost method for solving discrete optimization problems and thus have a vast array of applications in machine learning, operations research, and many other fields. Choosing cutting planes effectively is a major research topic in the theory and practice of integer programming. We conduct a novel structural analysis of branch-and-cut that pins down how every step of the algorithm is affected by changes in the parameters defining the cutting planes added to the input integer program. Our main application of this analysis is to derive sample complexity guarantees for using machine learning to determine which cutting planes to apply during branch-and-cut. These guarantees apply to infinite families of cutting planes, such as the family of Gomory mixed integer cuts, which are responsible for the main breakthrough speedups of integer programming solvers. We exploit geometric and combinatorial structure of branch-and-cut in our analysis, which provides a key missing piece for the recent generalization theory of branch-and-cut.  ( 2 min )
    Knowledgebra: An Algebraic Learning Framework for Knowledge Graph. (arXiv:2204.07328v1 [cs.LG])
    Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective.  ( 2 min )
    XDBERT: Distilling Visual Information to BERT from Cross-Modal Systems to Improve Language Understanding. (arXiv:2204.07316v1 [cs.CL])
    Transformer-based models are widely used in natural language understanding (NLU) tasks, and multimodal transformers have been effective in visual-language tasks. This study explores distilling visual information from pretrained multimodal transformers to pretrained language encoders. Our framework is inspired by cross-modal encoders' success in visual-language tasks while we alter the learning objective to cater to the language-heavy characteristics of NLU. After training with a small number of extra adapting steps and finetuned, the proposed XDBERT (cross-modal distilled BERT) outperforms pretrained-BERT in general language understanding evaluation (GLUE), situations with adversarial generations (SWAG) benchmarks, and readability benchmarks. We analyze the performance of XDBERT on GLUE to show that the improvement is likely visually grounded.  ( 2 min )
    Ensemble diverse hypotheses and knowledge distillation for unsupervised cross-subject adaptation. (arXiv:2204.07308v1 [cs.RO])
    Recognizing human locomotion intent and activities is important for controlling the wearable robots while walking in complex environments. However, human-robot interface signals are usually user-dependent, which causes that the classifier trained on source subjects performs poorly on new subjects. To address this issue, this paper designs the ensemble diverse hypotheses and knowledge distillation (EDHKD) method to realize unsupervised cross-subject adaptation. EDH mitigates the divergence between labeled data of source subjects and unlabeled data of target subjects to accurately classify the locomotion modes of target subjects without labeling data. Compared to previous domain adaptation methods based on the single learner, which may only learn a subset of features from input signals, EDH can learn diverse features by incorporating multiple diverse feature generators and thus increases the accuracy and decreases the variance of classifying target data, but it sacrifices the efficiency. To solve this problem, EDHKD (student) distills the knowledge from the EDH (teacher) to a single network to remain efficient and accurate. The performance of the EDHKD is theoretically proved and experimentally validated on a 2D moon dataset and two public human locomotion datasets. Experimental results show that the EDHKD outperforms all other methods. The EDHKD can classify target data with 96.9%, 94.4%, and 97.4% average accuracy on the above three datasets with a short computing time (1 ms). Compared to a benchmark (BM) method, the EDHKD increases 1.3% and 7.1% average accuracy for classifying the locomotion modes of target subjects. The EDHKD also stabilizes the learning curves. Therefore, the EDHKD is significant for increasing the generalization ability and efficiency of the human intent prediction and human activity recognition system, which will improve human-robot interactions.  ( 2 min )
    Methodical Advice Collection and Reuse in Deep Reinforcement Learning. (arXiv:2204.07254v1 [cs.LG])
    Reinforcement learning (RL) has shown great success in solving many challenging tasks via use of deep neural networks. Although using deep learning for RL brings immense representational power, it also causes a well-known sample-inefficiency problem. This means that the algorithms are data-hungry and require millions of training samples to converge to an adequate policy. One way to combat this issue is to use action advising in a teacher-student framework, where a knowledgeable teacher provides action advice to help the student. This work considers how to better leverage uncertainties about when a student should ask for advice and if the student can model the teacher to ask for less advice. The student could decide to ask for advice when it is uncertain or when both it and its model of the teacher are uncertain. In addition to this investigation, this paper introduces a new method to compute uncertainty for a deep RL agent using a secondary neural network. Our empirical results show that using dual uncertainties to drive advice collection and reuse may improve learning performance across several Atari games.  ( 2 min )
    A Differentially Private Probabilistic Framework for Modeling the Variability Across Federated Datasets of Heterogeneous Multi-View Observations. (arXiv:2204.07352v1 [cs.LG])
    We propose a novel federated learning paradigm to model data variability among heterogeneous clients in multi-centric studies. Our method is expressed through a hierarchical Bayesian latent variable model, where client-specific parameters are assumed to be realization from a global distribution at the master level, which is in turn estimated to account for data bias and variability across clients. We show that our framework can be effectively optimized through expectation maximization (EM) over latent master's distribution and clients' parameters. We also introduce formal differential privacy (DP) guarantees compatibly with our EM optimization scheme. We tested our method on the analysis of multi-modal medical imaging data and clinical scores from distributed clinical datasets of patients affected by Alzheimer's disease. We demonstrate that our method is robust when data is distributed either in iid and non-iid manners, even when local parameters perturbation is included to provide DP guarantees. Moreover, the variability of data, views and centers can be quantified in an interpretable manner, while guaranteeing high-quality data reconstruction as compared to state-of-the-art autoencoding models and federated learning schemes. The code is available at https://gitlab.inria.fr/epione/federated-multi-views-ppca.  ( 2 min )
    Crowd counting with crowd attention convolutional neural network. (arXiv:2204.07347v1 [cs.CV])
    Crowd counting is a challenging problem due to the scene complexity and scale variation. Although deep learning has achieved great improvement in crowd counting, scene complexity affects the judgement of these methods and they usually regard some objects as people mistakenly; causing potentially enormous errors in the crowd counting result. To address the problem, we propose a novel end-to-end model called Crowd Attention Convolutional Neural Network (CAT-CNN). Our CAT-CNN can adaptively assess the importance of a human head at each pixel location by automatically encoding a confidence map. With the guidance of the confidence map, the position of human head in estimated density map gets more attention to encode the final density map, which can avoid enormous misjudgements effectively. The crowd count can be obtained by integrating the final density map. To encode a highly refined density map, the total crowd count of each image is classified in a designed classification task and we first explicitly map the prior of the population-level category to feature maps. To verify the efficiency of our proposed method, extensive experiments are conducted on three highly challenging datasets. Results establish the superiority of our method over many state-of-the-art methods.  ( 2 min )
    Graph Pooling for Graph Neural Networks: Progress, Challenges, and Opportunities. (arXiv:2204.07321v1 [cs.LG])
    Graph neural networks have emerged as a leading architecture for many graph-level tasks such as graph classification and graph generation with a notable improvement. Among these tasks, graph pooling is an essential component of graph neural network architectures for obtaining a holistic graph-level representation of the entire graph. Although a great variety of methods have been proposed in this promising and fast-developing research field, to the best of our knowledge, little effort has been made to systematically summarize these methods. To set the stage for the development of future works, in this paper, we attempt to fill this gap by providing a broad review of recent methods on graph pooling. Specifically, 1) we first propose a taxonomy of existing graph pooling methods and provide a mathematical summary for each category; 2) next, we provide an overview of the libraries related to graph pooling, including the commonly used datasets, model architectures for downstream tasks, and open-source implementations; 3) then, we further outline in brief the applications that incorporate the idea of graph pooling in a number of domains; 4) and finally, we discuss some critical challenges faced by the current studies and share our insights on potential directions for improving graph pooling in the future.  ( 2 min )
    Revisiting the Adversarial Robustness-Accuracy Tradeoff in Robot Learning. (arXiv:2204.07373v1 [cs.RO])
    Adversarial training (i.e., training on adversarially perturbed input data) is a well-studied method for making neural networks robust to potential adversarial attacks during inference. However, the improved robustness does not come for free but rather is accompanied by a decrease in overall model accuracy and performance. Recent work has shown that, in practical robot learning applications, the effects of adversarial training do not pose a fair trade-off but inflict a net loss when measured in holistic robot performance. This work revisits the robustness-accuracy trade-off in robot learning by systematically analyzing if recent advances in robust training methods and theory in conjunction with adversarial robot learning can make adversarial training suitable for real-world robot applications. We evaluate a wide variety of robot learning tasks ranging from autonomous driving in a high-fidelity environment amenable to sim-to-real deployment, to mobile robot gesture recognition. Our results demonstrate that, while these techniques make incremental improvements on the trade-off on a relative scale, the negative side-effects caused by adversarial training still outweigh the improvements by an order of magnitude. We conclude that more substantial advances in robust learning methods are necessary before they can benefit robot learning tasks in practice.  ( 2 min )
    Pushing the Limits of Simple Pipelines for Few-Shot Learning: External Data and Fine-Tuning Make a Difference. (arXiv:2204.07305v1 [cs.CV])
    Few-shot learning (FSL) is an important and topical problem in computer vision that has motivated extensive research into numerous methods spanning from sophisticated meta-learning methods to simple transfer learning baselines. We seek to push the limits of a simple-but-effective pipeline for more realistic and practical settings of few-shot image classification. To this end, we explore few-shot learning from the perspective of neural network architecture, as well as a three stage pipeline of network updates under different data supplies, where unsupervised external data is considered for pre-training, base categories are used to simulate few-shot tasks for meta-training, and the scarcely labelled data of an novel task is taken for fine-tuning. We investigate questions such as: (1) How pre-training on external data benefits FSL? (2) How state-of-the-art transformer architectures can be exploited? and (3) How fine-tuning mitigates domain shift? Ultimately, we show that a simple transformer-based pipeline yields surprisingly good performance on standard benchmarks such as Mini-ImageNet, CIFAR-FS, CDFSL and Meta-Dataset. Our code and demo are available at https://hushell.github.io/pmf.  ( 2 min )
    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.  ( 2 min )
    Unsupervised Probabilistic Models for Sequential Electronic Health Records. (arXiv:2204.07292v1 [cs.LG])
    We develop an unsupervised probabilistic model for heterogeneous Electronic Health Record (EHR) data. Utilizing a mixture model formulation, our approach directly models sequences of arbitrary length, such as medications and laboratory results. This allows for subgrouping and incorporation of the dynamics underlying heterogeneous data types. The model consists of a layered set of latent variables that encode underlying structure in the data. These variables represent subject subgroups at the top layer, and unobserved states for sequences in the second layer. We train this model on episodic data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The resulting properties of the trained model generate novel insight from these complex and multifaceted data. In addition, we show how the model can be used to analyze sequences that contribute to assessment of mortality likelihood.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models. (arXiv:2204.07288v1 [cs.CL])
    With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider the trade-offs between accuracy, speed, and power consumption as input sizes or model sizes are varied. In this work, we perform a systematic study of this accuracy vs. efficiency trade-off on two widely used long-sequence models - Longformer-Encoder-Decoder (LED) and Big Bird - during fine-tuning and inference on four datasets from the SCROLLS benchmark. To study how this trade-off differs across hyperparameter settings, we compare the models across four sequence lengths (1024, 2048, 3072, 4096) and two model sizes (base and large) under a fixed resource budget. We find that LED consistently achieves better accuracy at lower energy costs than Big Bird. For summarization, we find that increasing model size is more energy efficient than increasing sequence length for higher accuracy. However, this comes at the cost of a large drop in inference speed. For question answering, we find that smaller models are both more efficient and more accurate due to the larger training batch sizes possible under a fixed resource budget.  ( 2 min )
    Active Learning for Regression and Classification by Inverse Distance Weighting. (arXiv:2204.07177v1 [cs.LG])
    This paper proposes an active learning algorithm for solving regression and classification problems based on inverse-distance weighting functions for selecting the feature vectors to query. The algorithm has the following features: (i) supports both pool-based and population-based sampling; (ii) is independent of the type of predictor used; (iii) can handle known and unknown constraints on the queryable feature vectors; and (iv) can run either sequentially, or in batch mode, depending on how often the predictor is retrained. The method's potential is shown in numerical tests on illustrative synthetic problems and real-world regression and classification datasets from the UCI repository. A Python implementation of the algorithm that we call IDEAL (Inverse-Distance based Exploration for Active Learning), is available at \url{this http URL}.  ( 2 min )
    Hierarchical Embedded Bayesian Additive Regression Trees. (arXiv:2204.07207v1 [stat.ME])
    We propose a simple yet powerful extension of Bayesian Additive Regression Trees which we name Hierarchical Embedded BART (HE-BART). The model allows for random effects to be included at the terminal node level of a set of regression trees, making HE-BART a non-parametric alternative to mixed effects models which avoids the need for the user to specify the structure of the random effects in the model, whilst maintaining the prediction and uncertainty calibration properties of standard BART. Using simulated and real-world examples, we demonstrate that this new extension yields superior predictions for many of the standard mixed effects models' example data sets, and yet still provides consistent estimates of the random effect variances. In a future version of this paper, we outline its use in larger, more advanced data sets and structures.  ( 2 min )
    Spatio-Temporal Analysis of Transformer based Architecture for Attention Estimation from EEG. (arXiv:2204.07162v1 [q-bio.NC])
    For many years now, understanding the brain mechanism has been a great research subject in many different fields. Brain signal processing and especially electroencephalogram (EEG) has recently known a growing interest both in academia and industry. One of the main examples is the increasing number of Brain-Computer Interfaces (BCI) aiming to link brains and computers. In this paper, we present a novel framework allowing us to retrieve the attention state, i.e degree of attention given to a specific task, from EEG signals. While previous methods often consider the spatial relationship in EEG through electrodes and process them in recurrent or convolutional based architecture, we propose here to also exploit the spatial and temporal information with a transformer-based network that has already shown its supremacy in many machine-learning (ML) related studies, e.g. machine translation. In addition to this novel architecture, an extensive study on the feature extraction methods, frequential bands and temporal windows length has also been carried out. The proposed network has been trained and validated on two public datasets and achieves higher results compared to state-of-the-art models. As well as proposing better results, the framework could be used in real applications, e.g. Attention Deficit Hyperactivity Disorder (ADHD) symptoms or vigilance during a driving assessment.  ( 2 min )
    Testing distributional assumptions of learning algorithms. (arXiv:2204.07196v1 [cs.LG])
    There are many important high dimensional function classes that have fast agnostic learning algorithms when strong assumptions on the distribution of examples can be made, such as Gaussianity or uniformity over the domain. But how can one be sufficiently confident that the data indeed satisfies the distributional assumption, so that one can trust in the output quality of the agnostic learning algorithm? We propose a model by which to systematically study the design of tester-learner pairs $(\mathcal{A},\mathcal{T})$, such that if the distribution on examples in the data passes the tester $\mathcal{T}$ then one can safely trust the output of the agnostic learner $\mathcal{A}$ on the data. To demonstrate the power of the model, we apply it to the classical problem of agnostically learning halfspaces under the standard Gaussian distribution and present a tester-learner pair with a combined run-time of $n^{\tilde{O}(1/\epsilon^4)}$. This qualitatively matches that of the best known ordinary agnostic learning algorithms for this task. In contrast, finite sample Gaussian distribution testers do not exist for the $L_1$ and EMD distance measures. A key step in the analysis is a novel characterization of concentration and anti-concentration properties of a distribution whose low-degree moments approximately match those of a Gaussian. We also use tools from polynomial approximation theory. In contrast, we show strong lower bounds on the combined run-times of tester-learner pairs for the problems of agnostically learning convex sets under the Gaussian distribution and for monotone Boolean functions under the uniform distribution over $\{0,1\}^n$. Through these lower bounds we exhibit natural problems where there is a dramatic gap between standard agnostic learning run-time and the run-time of the best tester-learner pair.  ( 2 min )
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.  ( 2 min )
    Robotic and Generative Adversarial Attacks in Offline Writer-independent Signature Verification. (arXiv:2204.07246v1 [cs.RO])
    This study explores how robots and generative approaches can be used to mount successful false-acceptance adversarial attacks on signature verification systems. Initially, a convolutional neural network topology and data augmentation strategy are explored and tuned, producing an 87.12% accurate model for the verification of 2,640 human signatures. Two robots are then tasked with forging 50 signatures, where 25 are used for the verification attack, and the remaining 25 are used for tuning of the model to defend against them. Adversarial attacks on the system show that there exists an information security risk; the Line-us robotic arm can fool the system 24% of the time and the iDraw 2.0 robot 32% of the time. A conditional GAN finds similar success, with around 30% forged signatures misclassified as genuine. Following fine-tune transfer learning of robotic and generative data, adversarial attacks are reduced below the model threshold by both robots and the GAN. It is observed that tuning the model reduces the risk of attack by robots to 8% and 12%, and that conditional generative adversarial attacks can be reduced to 4% when 25 images are presented and 5% when 1000 images are presented.  ( 2 min )
    The training response law explains how deep neural networks learn. (arXiv:2204.07291v1 [cond-mat.dis-nn])
    Deep neural network is the widely applied technology in this decade. In spite of the fruitful applications, the mechanism behind that is still to be elucidated. We study the learning process with a very simple supervised learning encoding problem. As a result, we found a simple law, in the training response, which describes neural tangent kernel. The response consists of a power law like decay multiplied by a simple response kernel. We can construct a simple mean-field dynamical model with the law, which explains how the network learns. In the learning, the input space is split into sub-spaces along competition between the kernels. With the iterated splits and the aging, the network gets more complexity, but finally loses its plasticity.  ( 2 min )
    Convergence and Implicit Regularization Properties of Gradient Descent for Deep Residual Networks. (arXiv:2204.07261v1 [cs.LG])
    We prove linear convergence of gradient descent to a global minimum for the training of deep residual networks with constant layer width and smooth activation function. We further show that the trained weights, as a function of the layer index, admits a scaling limit which is H\"older continuous as the depth of the network tends to infinity. The proofs are based on non-asymptotic estimates of the loss function and of norms of the network weights along the gradient descent path. We illustrate the relevance of our theoretical results to practical settings using detailed numerical experiments on supervised learning problems.  ( 2 min )
    Learning two-phase microstructure evolution using neural operators and autoencoder architectures. (arXiv:2204.07230v1 [cond-mat.mtrl-sci])
    Phase-field modeling is an effective mesoscale method for capturing the evolution dynamics of materials, e.g., in spinodal decomposition of a two-phase mixture. However, the accuracy of high-fidelity phase field models comes at a substantial computational cost. Hence, fast and generalizable surrogate models are needed to alleviate the cost in computationally taxing processes such as in optimization and design of materials. The intrinsic discontinuous nature of the physical phenomena incurred by the presence of sharp phase boundaries makes the training of the surrogate model cumbersome. We develop a new framework that integrates a convolutional autoencoder architecture with a deep neural operator (DeepONet) to learn the dynamic evolution of a two-phase mixture. We utilize the convolutional autoencoder to provide a compact representation of the microstructure data in a low-dimensional latent space. DeepONet, which consists of two sub-networks, one for encoding the input function at a fixed number of sensors locations (branch net) and another for encoding the locations for the output functions (trunk net), learns the mesoscale dynamics of the microstructure evolution in the latent space. The decoder part of the convolutional autoencoder can then reconstruct the time-evolved microstructure from the DeepONet predictions. The result is an efficient and accurate accelerated phase-field framework that outperforms other neural-network-based approaches while at the same time being robust to noisy inputs.  ( 2 min )
    Minimizing Control for Credit Assignment with Strong Feedback. (arXiv:2204.07249v1 [cs.NE])
    The success of deep learning attracted interest in whether the brain learns hierarchical representations using gradient-based learning. However, current biologically plausible methods for gradient-based credit assignment in deep neural networks need infinitesimally small feedback signals, which is problematic in biologically realistic noisy environments and at odds with experimental evidence in neuroscience showing that top-down feedback can significantly influence neural activity. Building upon deep feedback control (DFC), a recently proposed credit assignment method, we combine strong feedback influences on neural activity with gradient-based learning and show that this naturally leads to a novel view on neural network optimization. Instead of gradually changing the network weights towards configurations with low output loss, weight updates gradually minimize the amount of feedback required from a controller that drives the network to the supervised output label. Moreover, we show that the use of strong feedback in DFC allows learning forward and feedback connections simultaneously, using a learning rule fully local in space and time. We complement our theoretical results with experiments on standard computer-vision benchmarks, showing competitive performance to backpropagation as well as robustness to noise. Overall, our work presents a fundamentally novel view of learning as control minimization, while sidestepping biologically unrealistic assumptions.  ( 2 min )
    Brazilian Court Documents Clustered by Similarity Together Using Natural Language Processing Approaches with Transformers. (arXiv:2204.07182v1 [cs.AI])
    Recent advances in Artificial intelligence (AI) have leveraged promising results in solving complex problems in the area of Natural Language Processing (NLP), being an important tool to help in the expeditious resolution of judicial proceedings in the legal area. In this context, this work targets the problem of detecting the degree of similarity between judicial documents that can be achieved in the inference group, by applying six NLP techniques based on transformers, namely BERT, GPT-2 and RoBERTa pre-trained in the Brazilian Portuguese language and the same specialized using 210,000 legal proceedings. Documents were pre-processed and had their content transformed into a vector representation using these NLP techniques. Unsupervised learning was used to cluster the lawsuits, calculating the quality of the model based on the cosine of the distance between the elements of the group to its centroid. We noticed that models based on transformers present better performance when compared to previous research, highlighting the RoBERTa model specialized in the Brazilian Portuguese language, making it possible to advance in the current state of the art in the area of NLP applied to the legal sector.  ( 2 min )
    Relaxing Equivariance Constraints with Non-stationary Continuous Filters. (arXiv:2204.07178v1 [cs.LG])
    Equivariances provide useful inductive biases in neural network modeling, with the translation equivariance of convolutional neural networks being a canonical example. Equivariances can be embedded in architectures through weight-sharing and place symmetry constraints on the functions a neural network can represent. The type of symmetry is typically fixed and has to be chosen in advance. Although some tasks are inherently equivariant, many tasks do not strictly follow such symmetries. In such cases, equivariance constraints can be overly restrictive. In this work, we propose a parameter-efficient relaxation of equivariance that can effectively interpolate between a (i) non-equivariant linear product, (ii) a strict-equivariant convolution, and (iii) a strictly-invariant mapping. The proposed parameterization can be thought of as a building block to allow adjustable symmetry structure in neural networks. Compared to non-equivariant or strict-equivariant baselines, we experimentally verify that soft equivariance leads to improved performance in terms of test accuracy on CIFAR-10 and CIFAR-100 image classification tasks.  ( 2 min )
    Alternating Mahalanobis Distance Minimization for Stable and Accurate CP Decomposition. (arXiv:2204.07208v1 [cs.LG])
    CP decomposition (CPD) is prevalent in chemometrics, signal processing, data mining and many more fields. While many algorithms have been proposed to compute the CPD, alternating least squares (ALS) remains one of the most widely used algorithm for computing the decomposition. Recent works have introduced the notion of eigenvalues and singular values of a tensor and explored applications of eigenvectors and singular vectors in areas like signal processing, data analytics and in various other fields. We introduce a new formulation for deriving singular values and vectors of a tensor by considering the critical points of a function different from what is used in the previous work. Computing these critical points in an alternating manner motivates an alternating optimization algorithm which corresponds to alternating least squares algorithm in the matrix case. However, for tensors with order greater than equal to $3$, it minimizes an objective function which is different from the commonly used least squares loss. Alternating optimization of this new objective leads to simple updates to the factor matrices with the same asymptotic computational cost as ALS. We show that a subsweep of this algorithm can achieve a superlinear convergence rate for exact CPD with known rank and verify it experimentally. We then view the algorithm as optimizing a Mahalanobis distance with respect to each factor with ground metric dependent on the other factors. This perspective allows us to generalize our approach to interpolate between updates corresponding to the ALS and the new algorithm to manage the tradeoff between stability and fitness of the decomposition. Our experimental results show that for approximating synthetic and real-world tensors, this algorithm and its variants converge to a better conditioned decomposition with comparable and sometimes better fitness as compared to the ALS algorithm.  ( 2 min )
    Harnessing Interpretable Machine Learning for Origami Feature Design and Pattern Selection. (arXiv:2204.07235v1 [cond-mat.soft])
    Engineering design of origami systems is challenging because comparing different origami patterns requires using categorical features and evaluating multi-physics behavior targets introduces multi-objective problems. This work shows that a decision tree machine learning method is particularly suitable for the inverse design of origami. This interpretable machine learning method can reveal complex interactions between categorical features and continuous features for comparing different origami patterns, can tackle multi-objective problems for designing active origami with multi-physics performance targets, and can extend existing origami shape fitting algorithms to further consider non-geometrical performances of origami systems. The proposed framework shows a holistic way of designing active origami systems for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.  ( 2 min )
    Physics-Aware Recurrent Convolutional (PARC) Neural Networks to Assimilate Meso-scale Reactive Mechanics of Energetic Materials. (arXiv:2204.07234v1 [cond-mat.mtrl-sci])
    The thermomechanical properties of energetic materials (EM) are known to be a function of their microscopic structures, i.e., morphological configurations of crystals and pores. This microstructural dependency has motivated vigorous research in the EM community, seeking to engineer material microstructures with targeted properties and performance under the materials-by-design paradigm. However, establishing the complex structure-property-performance (SPP) relationships of EMs demands extensive experimental and simulation efforts, and assimilating and encapsulating these relationships in usable models is a challenge. Here, we present a novel deep learning method, Physics-Aware Recurrent Convolutional (PARC) Neural Network, that can "learn" the mesoscale thermo-mechanics of EM microstructures during the shock-to-detonation transition (SDT). We show that this new approach can produce accurate high-fidelity predictions of time-evolving temperature and pressure fields of the same quality as the state-of-the-art direct numerical simulations (DNS), despite the dramatic reduction of computing time, from hours and days on a high-performance computing cluster (HPC) to a little more than a second on a commodity laptop. We also demonstrate that PARC can provide physical insights, i.e., the artificial neurons can illuminate the underlying physics by identifying which microstructural features led to critical hotspots and what are the characteristics of "critical" versus "non-critical" microstructures. This new knowledge generated alongside the capacity to conduct high-throughput experiments will broaden our theoretical understanding of the initiation mechanisms of EM detonation, as a step towards engineering EMs with specific properties.  ( 2 min )
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.  ( 2 min )
  • Open

    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v1 [cs.LG])
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
    Diagnosing and Fixing Manifold Overfitting in Deep Generative Models. (arXiv:2204.07172v1 [stat.ML])
    Likelihood-based, or explicit, deep generative models use neural networks to construct flexible high-dimensional densities. This formulation directly contradicts the manifold hypothesis, which states that observed data lies on a low-dimensional manifold embedded in high-dimensional ambient space. In this paper we investigate the pathologies of maximum-likelihood training in the presence of this dimensionality mismatch. We formally prove that degenerate optima are achieved wherein the manifold itself is learned but not the distribution on it, a phenomenon we call manifold overfitting. We propose a class of two-step procedures consisting of a dimensionality reduction step followed by maximum-likelihood density estimation, and prove that they recover the data-generating distribution in the nonparametric regime, thus avoiding manifold overfitting. We also show that these procedures enable density estimation on the manifolds learned by implicit models, such as generative adversarial networks, hence addressing a major shortcoming of these models. Several recently proposed methods are instances of our two-step procedures; we thus unify, extend, and theoretically justify a large class of models.
    Solving the Dirichlet problem for the Monge-Amp\`ere equation using neural networks. (arXiv:2110.03310v2 [stat.ML] UPDATED)
    The Monge-Amp\`ere equation is a fully nonlinear partial differential equation (PDE) of fundamental importance in analysis, geometry and in the applied sciences. In this paper we solve the Dirichlet problem associated with the Monge-Amp\`ere equation using neural networks and we show that an ansatz using deep input convex neural networks can be used to find the unique convex solution. As part of our analysis we study the effect of singularities, discontinuities and noise in the source function, we consider nontrivial domains, and we investigate how the method performs in higher dimensions. We also compare this method to an alternative approach in which standard feed-forward networks are used together with a loss function which penalizes lack of convexity.
    Adjoined Networks: A Training Paradigm with Applications to Network Compression. (arXiv:2006.05624v5 [cs.LG] UPDATED)
    Compressing deep neural networks while maintaining accuracy is important when we want to deploy large, powerful models in production and/or edge devices. One common technique used to achieve this goal is knowledge distillation. Typically, the output of a static pre-defined teacher (a large base network) is used as soft labels to train and transfer information to a student (or smaller) network. In this paper, we introduce Adjoined Networks, or AN, a learning paradigm that trains both the original base network and the smaller compressed network together. In our training approach, the parameters of the smaller network are shared across both the base and the compressed networks. Using our training paradigm, we can simultaneously compress (the student network) and regularize (the teacher network) any architecture. In this paper, we focus on popular CNN-based architectures used for computer vision tasks. We conduct an extensive experimental evaluation of our training paradigm on various large-scale datasets. Using ResNet-50 as the base network, AN achieves 71.8% top-1 accuracy with only 1.8M parameters and 1.6 GFLOPs on the ImageNet data-set. We further propose Differentiable Adjoined Networks (DAN), a training paradigm that augments AN by using neural architecture search to jointly learn both the width and the weights for each layer of the smaller network. DAN achieves ResNet-50 level accuracy on ImageNet with $3.8\times$ fewer parameters and $2.2\times$ fewer FLOPs.
    Novelty Search in Representational Space for Sample Efficient Exploration. (arXiv:2009.13579v3 [cs.LG] UPDATED)
    We present a new approach for efficient exploration which leverages a low-dimensional encoding of the environment learned with a combination of model-based and model-free objectives. Our approach uses intrinsic rewards that are based on the distance of nearest neighbors in the low dimensional representational space to gauge novelty. We then leverage these intrinsic rewards for sample-efficient exploration with planning routines in representational space for hard exploration tasks with sparse rewards. One key element of our approach is the use of information theoretic principles to shape our representations in a way so that our novelty reward goes beyond pixel similarity. We test our approach on a number of maze tasks, as well as a control problem and show that our exploration approach is more sample-efficient compared to strong baselines.
    Causal Disentanglement with Network Information for Debiased Recommendations. (arXiv:2204.07221v1 [cs.IR])
    Recommender systems aim to recommend new items to users by learning user and item representations. In practice, these representations are highly entangled as they consist of information about multiple factors, including user's interests, item attributes along with confounding factors such as user conformity, and item popularity. Considering these entangled representations for inferring user preference may lead to biased recommendations (e.g., when the recommender model recommends popular items even if they do not align with the user's interests). Recent research proposes to debias by modeling a recommender system from a causal perspective. The exposure and the ratings are analogous to the treatment and the outcome in the causal inference framework, respectively. The critical challenge in this setting is accounting for the hidden confounders. These confounders are unobserved, making it hard to measure them. On the other hand, since these confounders affect both the exposure and the ratings, it is essential to account for them in generating debiased recommendations. To better approximate hidden confounders, we propose to leverage network information (i.e., user-social and user-item networks), which are shown to influence how users discover and interact with an item. Aside from the user conformity, aspects of confounding such as item popularity present in the network information is also captured in our method with the aid of \textit{causal disentanglement} which unravels the learned representations into independent factors that are responsible for (a) modeling the exposure of an item to the user, (b) predicting the ratings, and (c) controlling the hidden confounders. Experiments on real-world datasets validate the effectiveness of the proposed model for debiasing recommender systems.
    Statistical-Computational Trade-offs in Tensor PCA and Related Problems via Communication Complexity. (arXiv:2204.07526v1 [math.ST])
    Tensor PCA is a stylized statistical inference problem introduced by Montanari and Richard to study the computational difficulty of estimating an unknown parameter from higher-order moment tensors. Unlike its matrix counterpart, Tensor PCA exhibits a statistical-computational gap, i.e., a sample size regime where the problem is information-theoretically solvable but conjectured to be computationally hard. This paper derives computational lower bounds on the run-time of memory bounded algorithms for Tensor PCA using communication complexity. These lower bounds specify a trade-off among the number of passes through the data sample, the sample size, and the memory required by any algorithm that successfully solves Tensor PCA. While the lower bounds do not rule out polynomial-time algorithms, they do imply that many commonly-used algorithms, such as gradient descent and power method, must have a higher iteration count when the sample size is not large enough. Similar lower bounds are obtained for Non-Gaussian Component Analysis, a family of statistical estimation problems in which low-order moment tensors carry no information about the unknown parameter. Finally, stronger lower bounds are obtained for an asymmetric variant of Tensor PCA and related statistical estimation problems. These results explain why many estimators for these problems use a memory state that is significantly larger than the effective dimensionality of the parameter of interest.
    Soft Truncation: A Universal Training Technique of Score-based Diffusion Model for High Precision Score Estimation. (arXiv:2106.05527v4 [cs.LG] UPDATED)
    Recent advances in diffusion models bring the state-of-the art performance on image generation tasks. However, empirical results on previous research in diffusion models imply that there is an inverse correlation on performances for density estimation and sample generation. This paper analyzes that the inverse correlation arises because density estimation is mostly contributed from small diffusion time, whereas sample generation mainly depends on large diffusion time. However, training score network on both small and large diffusion time is demanding because of the loss imbalance issue. To successfully train the score network on both small and large diffusion time, this paper introduces a training technique, Soft Truncation, that softens the truncation time for every mini-batch update, which is universally applicable to any types of diffusion models. It turns out that Soft Truncation is equivalent to a diffusion model with a general weight, and we prove the variational bound of the general weighted diffusion model. In view of this variational bound, Soft Truncation becomes a natural way to train the score network. In experiments, Soft Truncation achieves the state-of-the-art performance on CIFAR-10, CelebA, CelebA-HQ $256\times 256$, and STL-10 datasets.  ( 2 min )
    Latent Gaussian Model Boosting. (arXiv:2105.08966v4 [cs.LG] UPDATED)
    Latent Gaussian models and boosting are widely used techniques in statistics and machine learning. Tree-boosting shows excellent prediction accuracy on many data sets, but potential drawbacks are that it assumes conditional independence of samples, produces discontinuous predictions for, e.g., spatial data, and it can have difficulty with high-cardinality categorical variables. Latent Gaussian models, such as Gaussian process and grouped random effects models, are flexible prior models which explicitly model dependence among samples and which allow for efficient learning of predictor functions and for making probabilistic predictions. However, existing latent Gaussian models usually assume either a zero or a linear prior mean function which can be an unrealistic assumption. This article introduces a novel approach that combines boosting and latent Gaussian models to remedy the above-mentioned drawbacks and to leverage the advantages of both techniques. We obtain increased prediction accuracy compared to existing approaches in both simulated and real-world data experiments.  ( 2 min )
    Two-Step Meta-Learning for Time-Series Forecasting Ensemble. (arXiv:2011.10545v2 [stat.ML] UPDATED)
    Amounts of historical data collected increase and business intelligence applicability with automatic forecasting of time series are in high demand. While no single time series modeling method is universal to all types of dynamics, forecasting using an ensemble of several methods is often seen as a compromise. Instead of fixing ensemble diversity and size, we propose to predict these aspects adaptively using meta-learning. Meta-learning here considers two separate random forest regression models, built on 390 time-series features, to rank 22 univariate forecasting methods and recommend ensemble size. The forecasting ensemble is consequently formed from methods ranked as the best, and forecasts are pooled using either simple or weighted average (with a weight corresponding to reciprocal rank). The proposed approach was tested on 12561 micro-economic time-series (expanded to 38633 for various forecasting horizons) of M4 competition where meta-learning outperformed Theta and Comb benchmarks by relative forecasting errors for all data types and horizons. Best overall results were achieved by weighted pooling with a symmetric mean absolute percentage error of 9.21% versus 11.05% obtained using the Theta method.  ( 2 min )
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v3 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.  ( 2 min )
    A Statistical Decision-Theoretical Perspective on the Two-Stage Approach to Parameter Estimation. (arXiv:2204.00036v2 [stat.ME] UPDATED)
    One of the most important problems in system identification and statistics is how to estimate the unknown parameters of a given model. Optimization methods and specialized procedures, such as Empirical Minimization (EM) can be used in case the likelihood function can be computed. For situations where one can only simulate from a parametric model, but the likelihood is difficult or impossible to evaluate, a technique known as the Two-Stage (TS) Approach can be applied to obtain reliable parametric estimates. Unfortunately, there is currently a lack of theoretical justification for TS. In this paper, we propose a statistical decision-theoretical derivation of TS, which leads to Bayesian and Minimax estimators. We also show how to apply the TS approach on models for independent and identically distributed samples, by computing quantiles of the data as a first step, and using a linear function as the second stage. The proposed method is illustrated via numerical simulations.  ( 2 min )
    Bayesian Nonparametrics for Sparse Dynamic Networks. (arXiv:1607.01624v2 [stat.ML] UPDATED)
    In this paper we propose a Bayesian nonparametric approach to modelling sparse time-varying networks. A positive parameter is associated to each node of a network, which models the sociability of that node. Sociabilities are assumed to evolve over time, and are modelled via a dynamic point process model. The model is able to capture long term evolution of the sociabilities. Moreover, it yields sparse graphs, where the number of edges grows subquadratically with the number of nodes. The evolution of the sociabilities is described by a tractable time-varying generalised gamma process. We provide some theoretical insights into the model and apply it to three datasets: a simulated network, a network of hyperlinks between communities on Reddit, and a network of co-occurences of words in Reuters news articles after the September 11th attacks.  ( 2 min )
    Enforcing fairness in private federated learning via the modified method of differential multipliers. (arXiv:2109.08604v2 [cs.LG] UPDATED)
    Federated learning with differential privacy, or private federated learning, provides a strategy to train machine learning models while respecting users' privacy. However, differential privacy can disproportionately degrade the performance of the models on under-represented groups, as these parts of the distribution are difficult to learn in the presence of noise. Existing approaches for enforcing fairness in machine learning models have considered the centralized setting, in which the algorithm has access to the users' data. This paper introduces an algorithm to enforce group fairness in private federated learning, where users' data does not leave their devices. First, the paper extends the modified method of differential multipliers to empirical risk minimization with fairness constraints, thus providing an algorithm to enforce fairness in the central setting. Then, this algorithm is extended to the private federated learning setting. The proposed algorithm, \texttt{FPFL}, is tested on a federated version of the Adult dataset and an "unfair" version of the FEMNIST dataset. The experiments on these datasets show how private federated learning accentuates unfairness in the trained models, and how FPFL is able to mitigate such unfairness.  ( 2 min )
    Towards a Unified Framework for Uncertainty-aware Nonlinear Variable Selection with Theoretical Guarantees. (arXiv:2204.07293v1 [stat.ML])
    We develop a simple and unified framework for nonlinear variable selection that incorporates model uncertainty and is compatible with a wide range of machine learning models (e.g., tree ensembles, kernel methods and neural network). In particular, for a learned nonlinear model $f(\mathbf{x})$, we consider quantifying the importance of an input variable $\mathbf{x}^j$ using the integrated gradient measure $\psi_j = \Vert \frac{\partial}{\partial \mathbf{x}^j} f(\mathbf{x})\Vert^2_2$. We then (1) provide a principled approach for quantifying variable selection uncertainty by deriving its posterior distribution, and (2) show that the approach is generalizable even to non-differentiable models such as tree ensembles. Rigorous Bayesian nonparametric theorems are derived to guarantee the posterior consistency and asymptotic uncertainty of the proposed approach. Extensive simulation confirms that the proposed algorithm outperforms existing classic and recent variable selection methods.  ( 2 min )
    Tighter Theory for Local SGD on Identical and Heterogeneous Data. (arXiv:1909.04746v4 [cs.LG] UPDATED)
    We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug $H=1$, where $H$ is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.  ( 2 min )
    Causal Transformer for Estimating Counterfactual Outcomes. (arXiv:2204.07258v1 [cs.LG])
    Estimating counterfactual outcomes over time from observational data is relevant for many applications (e.g., personalized medicine). Yet, state-of-the-art methods build upon simple long short-term memory (LSTM) networks, thus rendering inferences for complex, long-range dependencies challenging. In this paper, we develop a novel Causal Transformer for estimating counterfactual outcomes over time. Our model is specifically designed to capture complex, long-range dependencies among time-varying confounders. For this, we combine three transformer subnetworks with separate inputs for time-varying covariates, previous treatments, and previous outcomes into a joint network with in-between cross-attentions. We further develop a custom, end-to-end training procedure for our Causal Transformer. Specifically, we propose a novel counterfactual domain confusion loss to address confounding bias: it aims to learn adversarial balanced representations, so that they are predictive of the next outcome but non-predictive of the current treatment assignment. We evaluate our Causal Transformer based on synthetic and real-world datasets, where it achieves superior performance over current baselines. To the best of our knowledge, this is the first work proposing transformer-based architecture for estimating counterfactual outcomes from longitudinal data.  ( 2 min )
    Warped Dynamic Linear Models for Time Series of Counts. (arXiv:2110.14790v2 [stat.ME] UPDATED)
    Dynamic Linear Models (DLMs) are commonly employed for time series analysis due to their versatile structure, simple recursive updating, ability to handle missing data, and probabilistic forecasting. However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility. We introduce a novel semiparametric methodology for count time series by warping a Gaussian DLM. The warping function has two components: a (nonparametric) transformation operator that provides distributional flexibility and a rounding operator that ensures the correct support for the discrete data-generating process. We develop conjugate inference for the warped DLM, which enables analytic and recursive updates for the state space filtering and smoothing distributions. We leverage these results to produce customized and efficient algorithms for inference and forecasting, including Monte Carlo simulation for offline analysis and an optimal particle filter for online inference. This framework unifies and extends a variety of discrete time series models and is valid for natural counts, rounded values, and multivariate observations. Simulation studies illustrate the excellent forecasting capabilities of the warped DLM. The proposed approach is applied to a multivariate time series of daily overdose counts and demonstrates both modeling and computational successes.  ( 2 min )
    Universal approximation property of invertible neural networks. (arXiv:2204.07415v1 [cs.LG])
    Invertible neural networks (INNs) are neural network architectures with invertibility by design. Thanks to their invertibility and the tractability of Jacobian, INNs have various machine learning applications such as probabilistic modeling, generative modeling, and representation learning. However, their attractive properties often come at the cost of restricting the layer designs, which poses a question on their representation power: can we use these models to approximate sufficiently diverse functions? To answer this question, we have developed a general theoretical framework to investigate the representation power of INNs, building on a structure theorem of differential geometry. The framework simplifies the approximation problem of diffeomorphisms, which enables us to show the universal approximation properties of INNs. We apply the framework to two representative classes of INNs, namely Coupling-Flow-based INNs (CF-INNs) and Neural Ordinary Differential Equations (NODEs), and elucidate their high representation power despite the restrictions on their architectures.  ( 2 min )
    Distributed Reconstruction of Noisy Pooled Data. (arXiv:2204.07491v1 [cs.IT])
    In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.  ( 2 min )

  • Open

    [R][P] Mask Transfiner for High-Quality Instance Segmentation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [P] Spoonfy: Turn any foreign-language video into effective listening practice
    Video (Despacito, slightly NSFW): https://drive.google.com/file/d/12qYKv_yaqGr9GWvHPtE9ng2foJVPfpoE/view?usp=sharing Code & more info: https://github.com/athairus/SpoonfyDemo Discord: https://discord.gg/7wcZZzeSQk Spoonfy is essentially the so-called Telenovela method (learning languages through subtitled video) on steroids: This demo uses a finetuned Facebook's M2M100 model to translate Spanish to English (finetuned to do literal translation instead of ordinary translation) and a wav2vec2 model to get Spanish word timings to present the literal (aka word-by-word) translations as karaoke-style lyrics. What sets Spoonfy apart from other solutions is the way it leverages the massive body of existing subtitled content out there to create learning material. Also because it's FOSS. More details in the code's README. I've been working on this project by myself for a few months now, I hope you see potential behind it like I do! If so (but also if not), I'd love to hear what you think. And I'd love to get your help improving on what I already built. I have plenty of ideas for how to make the translations even more accurate, the system more robust & able to handle more sources of content (YouTube, TikTok, Blu-Rays), etc. Thanks for checking it out! submitted by /u/athairus [link] [comments]  ( 1 min )
    [D] Current work on knowledge representation with your preference, and use of language models
    In robotics and autonomous systems, knowledge representation is an important aspect. What is your favorite methods for knowledge representation, is it Formal logic or graphs or whatever and why you like that kind of representation. Considering the success of large language model isn't it a good time to use them in new kind of representation, so that robots or similar system can make better decisions in an environment. I still feel there is no common consensus in community for correct way of knowledge representation, correct me If I am wrong. submitted by /u/projekt_treadstone [link] [comments]  ( 1 min )
    [D] DALL-E 2 vs Disco Diffusion - SHOWDOWN!
    submitted by /u/nin_artificial [link] [comments]  ( 1 min )
    WACV vs. BMVC [R]
    How do they compare in terms of the communities, prestige, competitiveness, and impact. I have a paper accepted to a CVPR workshop and considering extending it and submitting to one of these. The work is based on explainability in medical vision. It's more methods-oriented rather than large-scale experiments. What are your suggestions? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [N] [P] Access 100+ image, video & audio datasets in seconds with one line of code & stream them while training ML models with Activeloop Hub (more at docs.activeloop.ai, description & links in the comments below)
    submitted by /u/davidbun [link] [comments]  ( 5 min )
    [D] Is it ok to promise a dataset in your paper, get published and then not release it?
    Recently, I decided to explore NeRF and found a very interesting dataset in the NeRS paper of 3D models, which was published in NeurIPS 2021 four months ago. Authors promised to release their dataset: The filtered dataset with anonymized personally identifiable information (e.g. license plates and phone numbers), masks, initial camera poses, and optimized NeRS cameras will be made available on the project page. However, if you check their project page or github repo — there is nothing there. I do not have much experience in machine learning, but wonder whether it's ok to do this? My thinking was that it is something to look down upon, but in this case it is done by Carnegie Mellon University (which is a top-tier one in ML?) on a top-tier conference (NeurIPS 2021). So I assume it's fine? submitted by /u/throwmeaway-account [link] [comments]  ( 5 min )
    [D] Wasserstein distance lipschitz vs gaussian distribution
    Hi, I heard there are different ways to calculate Wasserstein distance in Neural network context. First, We can convert 1-d Wasserstein loss to dual representation and constraint it's size(to makes lipschitz function). We need to do weight clip to make our model a lipschitz function. Second, we can make neural network output as Gaussian distribution and calculate easy form using neural network output as mean and covariance matrix. So, what are the advantages and disadvantages of comparing them? It may sound ambiguous, but I have not seen a study that compares the two about representation quality, computation, etc... Thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    [D] What do you use to make your blog/personal websites?
    I've noticed a lot of folks in ML have a personal website that doubles as a blog to write about their work/projects. As someone looking to build their own website along the same lines, I'm looking for frameworks to try and build it with. What framework do you use to design your site? submitted by /u/SwiftLynx [link] [comments]  ( 3 min )
    [Discussion] Interpretable Neural Network ... ?
    Hi All! I've been working on a linear method that extracts signals from images by learning a set of composable image filters. It can recompose an image using these filters as seen on this biological histology tissue (real on the right, recomposed on the left) ​ ​ https://preview.redd.it/czgdk6edx0u81.png?width=768&format=png&auto=webp&s=8768f93a749fff7dc41e576a74403096e113942e ​ Because it is a linear method that learns image filters - I had an idea: what if some components of a neural network could be replaced with a learnable set of filters? For those not in the know, image filters are similar to masks that upweight some parts of the image, and downweights other parts - similar to a highlighter to select text and a pen to cross out words. I show how in the figure below: ​ ​ https://preview.redd.it/2azco14ex0u81.jpg?width=499&format=pjpg&auto=webp&s=f3795fec06daa13da61ec155159a0ad865524530 ​ Learning a set of image filters with a neural network is a good idea, as neural networks are much more flexible and are considered to be "universal function approximations". So I wrote up a Pytorch package to pass the neural network feature weights from Convolutions and Max Pooling into the linear method to learn a relevant set of filters - results are comparable even on CIFAR10. The caveat is that there is no ReLU, no other activation functions, and no Dropout - only 1 main single linear layer that learns filters... an interpretable neural network! ​ Results are all here (including ipynb comparing with base CNN and VGG16) https://github.com/AskExplain/Interpretable-Neural-Net ​ I'll update the GitHub with some figures of why the single layer is interpretable soon ... ​ In the meantime - discuss! submitted by /u/TryToExplainHow [link] [comments]  ( 3 min )
    [P] New Graph Data Augmentation Library
    Hello! I recently built grafog, a graph data augmentation library on top of PyTorch Geometric. You can chain together graph mentations as done in albumentations or torchvision.transforms. Check it out: https://github.com/rish-16/grafog It has the following augmentations: Random Node Drop Random Edge Drop Normalize Features MixUp Strategy Node Feature Masking Edge Feature Masking https://preview.redd.it/c53r7gkrk0u81.png?width=689&format=png&auto=webp&s=8fbe668e82571a7fe5de9ebb5e4690dbd34032bb https://preview.redd.it/5zrj4gkrk0u81.png?width=863&format=png&auto=webp&s=5bd02ea4adaf86b8911fa89372be9f05f9010536 Happy augmenting! submitted by /u/rish-16 [link] [comments]
    [N]: How does OpenAI's DALL-E 2 work?
    submitted by /u/giugiacaglia [link] [comments]
    [D] What is the opposite of an ablative study?
    I've the feeling that this question may be really stupid but I make it anyway. In ML we often see ablative studies. How is the opposite of it called? In other words: A study that aim to improve a model, and once an improvement is reached, this new model is taken as basis for further investigations? submitted by /u/Rogitus [link] [comments]  ( 2 min )
  • Open

    Build & share machine learning apps directly in browser using Gradio in Python
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    AI Trippy Dream 19 - Exploring a Colorful Maze
    submitted by /u/LordPewPew777 [link] [comments]
    a better Boids simulation: An artificial life simulation of the flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    New AI upscaler tool
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    credit scoring for companies
    Hello everyone I'm newbie so pardon me if you find that my question is stupid. I'm working on a project here's it's description in a nutshell ( Classifying companies if they're going to bankrupt or not and based of the probability of default ( probability of bankruptcy) give each companies a score For example 88 percent to bankrupt score is D 21 percent to bankrupt score is B 3 percent to bankrupt score is a) My question is what kind of models should test ? Should i go for machine learning algorithms such as logistic regression, knn, SVM? Should I go for neural networks ANN? Or can I use deep learning models like MLP... probabilistic Neural Network? Any guidance or advice will be appreciated and thanks a lot. submitted by /u/YeccAnon4 [link] [comments]  ( 1 min )
  • Open

    LSTM for time series prediction
    Hi I am doing a project where I have to predict sales for a company and I am having some trouble with my LSTM model in python. All research I have done tells me that LSTM is as good, if not better, than a ARIMA model for forecasting on time series data, but my LSTM is significantly worse than my ARIMA model. Would it be possible for any one to help me to see if I have implemented it right? I have used both Tensorflow and Pytorch and both are way worse than the ARIMA model. submitted by /u/magnussendjoko [link] [comments]  ( 2 min )
  • Open

    Learning style of play (different agents' actions) in the same offline RL environment?
    Hi, everyone. I'm a relative novice in RL, so bear with me as I try to formulate my question. I'm working on a chess bot that can play moves like a player (imitate their style of play) that is chosen from a set of players (that the bot is trained on) , if I give the bot the previous x moves. Using more technical terms, I'm trying to create an agent that is given a sequence of states-actions of another agent (player) and some representation of who that agent (player) is, and predict the next action (continue playing in the style of that player). I'm fairly certain this is an RL problem, as I don't know how to frame it as a supervised learning problem (I might be wrong). I've seen some papers that abstract offline RL as a sequence modeling problem (Decision Transformer, Trajectory Transformer), so I'm fairly certain I should continue in a similar manner. But I'm having a hard time trying to understand how to treat the difference in players. My instinct was to use some representation of the player as the reward, but then how would I even optimize for it or even give it as an input? Do I just add the player as a feature in the game state, but then what should be the reward? Has this been done before, or something similar? I couldn't really find any paper or code that worked on differentiating the training data by who made it (I might not be wording it correctly). submitted by /u/OverhypeUnderdeliver [link] [comments]  ( 3 min )
  • Open

    Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience
    Data has become a huge area of business, helping businesses to drive their intelligence, make better decisions, and formulate strategic plans for future growth. The post Using Data Warehousing as a Service (DWaaS) To Improve Customer Experience appeared first on Data Science Central.  ( 7 min )
    ML classifies gravitational-wave glitches with high accuracy
    The LIGO observatory can detect astronomical events from billions of light years away. Terabytes of complex daily data makes human analysis impossible. New study applies neural network with up to 97% classification accuracy. Caltech/MIT’s LIGO, the largest gravitational-wave observatory in the world, collects data on minute space-time ripples from cataclysmic astronomical events like colliding black… Read More »ML classifies gravitational-wave glitches with high accuracy The post ML classifies gravitational-wave glitches with high accuracy appeared first on Data Science Central.  ( 4 min )
    Zero Trust Principles: What is Zero Trust Model?
    The central principle of the Zero Trust model is based on the authentication and verification of every device connecting to the network before they are trusted. Former Forrester analyst and veteran of the high-technology world, John Kindervag, who has been actively part of a wide array of network technology projects, coined the term “Zero Trust”… Read More »Zero Trust Principles: What is Zero Trust Model? The post Zero Trust Principles: What is Zero Trust Model? appeared first on Data Science Central.  ( 4 min )

  • Open

    [R] Questions about ACL Rolling Review
    A few questions about ACL ARR: - If you request to reassign a reviewer, would the editor aim for reassigning all three reviewers or he would go for reassigning only that particular reviewer? Assume you have given a valid reason for reassignment and the editor is convinced. - If you request to reassign a reviewer, can the new reviewer see the previous reviews/scores before submitting his own review? Or he would access to the previous revision after submitting his own review. I already know (have heard) that in many cases reviewers are not available, and it becomes inevitable to get an entirely new set of reviews. I already know this. But my questions are about the case that the reviewer availability is not an issue. Juts trying to find out how things are managed submitted by /u/sim_inf [link] [comments]  ( 1 min )
    [R][P] MultiMAE: Multi-modal Multi-task Masked Autoencoders + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    [Discussion]Is it possible to find a SWE job with a DS master degree? Or would it be possible to make the transtion later on?
    Say that a masters student graduates in a DS program with heavy focus on data and CS (so knows the basics of CS like data structures and programming, and has also studied courses like data mining, big data analytics, and machine learning), what are their possible job openings and relatively easy positions to get into? My understanding is really rudimentaly, feel free to correct me pls: Data scientist, this should be the most fitting and easy-to-get-interview position. Difficulty level 1/5. Data analyist, same as 1. Difficulty level 1/5. Data engineer, same as 1. Difficulty level 1/5. Machine learning engineer, has a higher bar than 1, 2, and 3, and it's very difficult to get interviews without proper background and work experience. So it's very difficult to become one for a masters graduate in DS, but it's quite possible for DS/DE (but much less so for DA) to make into MLE positions. Difficulty level 3/5. Software engineer, it's totally another realm, and has very few skill overlap with 1, 2, and 3. So it's very hard to make the transition or land a SWE job for DS students. Difficulty level 5/5. submitted by /u/Competitive_Map_935 [link] [comments]  ( 1 min )
    [Project] Open-source playground to generate images from text using DALL-E Mini
    submitted by /u/koryoislie [link] [comments]
    [D] Incorporating node features into GNNs?
    Hey all, I am looking to learn more about how to incorporate node features with their embeddings for training. Specifically, I am working with gene-gene interaction networks, and also want to include RNA-sequencing quantifications. If anyone has a good introductory resource so I can familiarize myself with the process, I would really appreciate it! submitted by /u/PM_ME_A_ONELINER [link] [comments]  ( 1 min )
    [D] Moderation uniformity in subreddit
    This isn't meant to be a rant. Rather far from it. Yesterday I posted a legitimate question about databases choices in r/MachineLearning. This was about what technical choices ML members are currently using for large scale data ingestion in a continual learning environment. The post was removed. That post was neither a (1) beginner nor (2) offensive and (3) aimed to be a constructive discussion suitable as mid-range ML query and (4) marked with appropriate flair I finally posted it elsewhere. Yet today I see questions about transitions between DS -> MLE and quirky labgroup names which can be used from ML terms. These aren't even research questions. https://www.reddit.com/r/MachineLearning/comments/u503vz/d_is_it_easier_to_transition_to_mle_as_a_ds_or_swe/ https://www.reddit.com/r/MachineLearning/comments/u5091o/d_do_you_know_any_funny_team_names_with/ How is this fair moderation genuinely? How can we improve the noise filter or do better moderation? submitted by /u/mlbloke [link] [comments]  ( 2 min )
    [D] Counterfactual Fairness
    So I watched this old video by Microsoft Research: https://www.youtube.com/watch?v=psA4U6nhZ70 To summarize, it uses the fairness criteria that sensitive attribute A will give the same prediction regardless of the value, when using counterfactuals. That is, if you're male or female, it shouldn't influence the models predictions. The idea seems decent at first glance. But what if the "bias" or "unfairness" that the model creates based on sensitive attribute A isn't caused by a dataset bias but rather detects a real signal in the data? The model proposed by Microsoft Research doesn't take into consideration that the prediction on the sensitive attribute A does not necessarily consist of ONLY unfairness. They simply define it as such. Is such an algorithmic design choice not exactly one of the flaws that we seek to eliminate? Assuming, that not all of the imbalance in predictions by the model on the sensitive attribute A is caused by "unfairness" but that some of it is caused by an inherent difference, then are they not introducing direct human bias and unfairness into their model by explicitedly designing the system to fit their own human (and political) bias? Don't get me wrong; the opposite is just as bad. Assuming that ALL of the imbalance in the prediction on the sensitive attribute A is caused by "inherent differences" is just as bad. Do you know of anyone that has tackled this in a good manner? How would you even begin to estimate how much is due to an "inherent difference" and how much is due to "bias, unfairness, noise" (or otherwise)? submitted by /u/caahel [link] [comments]  ( 4 min )
    [D] Paper Explained - Transformer Memory as a Differentiable Search Index (Full Video Walkthrough)
    https://youtu.be/qlB0TPBQ7YY Search engines work by building an index and then looking up things in it. Usually, that index is a separate data structure. In keyword search, we build and store reverse indices. In neural search, we build nearest-neighbor indices. This paper does something different: It directly trains a Transformer to return the ID of the most relevant document. No similarity search over embeddings or anything like this is performed, and no external data structure is needed, as the entire index is essentially captured by the model's weights. The paper experiments with various ways of representing documents and training the system, which works surprisingly well! OUTLINE: 0:00 - Intro 0:45 - Sponsor: Diffgram 1:35 - Paper overview 3:15 - The search problem, classic and neural 8:15 - Seq2seq for directly predicting document IDs 11:05 - Differentiable search index architecture 18:05 - Indexing 25:15 - Retrieval and document representation 33:25 - Training DSI 39:15 - Experimental results 49:25 - Comments & Conclusions ​ Paper: https://arxiv.org/abs/2202.06991 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [R] Useful method to train models for adversarial robustness
    submitted by /u/IncredibleMac [link] [comments]
    [P] Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 1 min )
    [D] Spotify's Podcast Search Explained
    I wrote this article breaking down how Spotify have applied semantic search to enhance podcast discovery. I find it super interesting to see the approach Spotify have used in terms of data sources, model fine-tuning, and vector search - and wanted to show how to almost replicate it. Let me know if you have any thoughts on their approach! submitted by /u/jamescalam [link] [comments]
    [R] Machine learning in management of precautionary closures caused by lipophilic biotoxins
    In this work, we have covered a deep study of alternatives in order to improve the aquaculture of mussels with very noisy and unbalanced data https://www.sciencedirect.com/science/article/pii/S0168169922002733 submitted by /u/ennanco [link] [comments]
    [P] RR-GCN now supports multi-modal learning!
    We have just released v0.0.2 of our RR-GCN. This release includes support for multi-modal learning. Node embeddings can now be initialised with literal information or pre-trained embeddings for text and image data. Go check out our notebooks that show how we can achieve state-of-the-art performance on several benchmark datasets in less than one minute. Moreover, and more importantly, the representations produced by our RR-GCN are unsupervised and parameter-free (i.e. no training is required), making it possible to re-use them for multiple downstream ML tasks with high predictive performances. ​ https://github.com/predict-idlab/RR-GCN submitted by /u/givdwiel [link] [comments]  ( 1 min )
    [D] Paper Explained – SEER explained: Vision Models more Robust & Fair when pretrained on UNCURATED images!?
    https://youtu.be/XHAoV_nKr1o This video explains the 10 billion parameter SEER model from MetaAI by Goyal et al. 2022. Paper link: https://arxiv.org/abs/2202.08360 Official implementation: https://github.com/facebookresearch/vissl/tree/main/projects/SEER Short description: The 10 billion parameter SEER model from u/MetaAI is *fairer*, even though it is trained on *uncurated* data. How so? Check out our take on this. Outline: 00:00 Training on uncurated data 01:12 Diffgram (Sponsor) 01:46 Toxicity in large models 02:43 What to do against model toxicity? 03:53 SEER model explained 06:52 SEER is fairer. But how? submitted by /u/AICoffeeBreak [link] [comments]  ( 1 min )
  • Open

    Is there an AI I could use to create an artificial Terence McKenna chatbot?
    He’s basically this wacky dead philosopher with 1000s of hours of his lectures of YT and I was thinking it may be possible to create an artificial AI personality of his from all of his recorded speech? Would there be a simple enough program I could download or anything of the sort? submitted by /u/Vaporshots [link] [comments]  ( 1 min )
    Boids: An artificial life simulation of a flock of birds
    submitted by /u/Seitoh [link] [comments]  ( 1 min )
    AI Trippy Dream 37 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Amazing Generation
    Looks Amazing. The vibe is there. What do you think ? How did he archive this ? Created by Hand or through artificial ? https://www.tiktok.com/@ai.metascape/video/7086451191151971586 submitted by /u/PillowG1rl [link] [comments]
    Little Baby Chibi Lucy Loud
    submitted by /u/VIRUS-AOTOXIN [link] [comments]
    Artificial Intelligence is the Future of Deterrence
    submitted by /u/much_successes [link] [comments]
    LinkedIn Open-Sources ‘Feathr’, It’s Feature Store To Simplify Machine Learning (ML) Feature Management And Improve Developer Productivity
    LinkedIn research team has recently open-sourced feature store, Feathr, created to simplify machine learning (ML) feature management and increase developer productivity. Feathr is used by dozens of LinkedIn applications to define features, compute them for training, deploy them in production, and share them across consumers. Compared to previous application-specific feature pipeline solutions, Feathr users reported significantly reduced time required to add new features to model training and improved runtime performance. Hundreds of ML models run on LinkedIn in Search, Feed, and Ads applications. Thousands of features about entities in the Economic Graph, such as companies, job postings, and LinkedIn members, power the models. The most time-consuming aspects of handling the ML applications at scale have been preparing and managing features. Continue reading the summary Github: https://github.com/linkedin/feathr LinkedIn Blog: https://engineering.linkedin.com/blog/2022/open-sourcing-feathr—linkedin-s-feature-store-for-productive-m submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Artificial Nightmares: Crypt Walker || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    I created a DIY python package to ensemble multimodal models
    Multimodal: A python package to ensemble speech, text, etc. models and build new applications. Sample Applications: Speech Named Entity Anonymizer, Speech Question Answering, Speech Generation Code: kritiksoman/Multimodal: Listen. Write. Speak. Read. Think. (github.com) submitted by /u/kritiksoman [link] [comments]
  • Open

    Rigorous treatment of MDPs, Bellman, etc. in continuous spaces?
    I am looking for a book/monograph that goes through all the basics of reinforcement learning for continuous spaces with mathematical rigor. The classic RL book from Sutton/Barto and the new RL theory book from Agarwal/Jiang/Kakade/Sun both stick to finite MDPs except for special cases like linear MDPs and the LQR. I assume that a general statement of the fundamentals for continuous spaces will require grinding through a lot of details on existence, measurability, suprema vs. maxima, etc., that are not issues in the finite case. Is this why these authors avoid it? clarifying edit: I don't need to go all the way to continuous time - just state and action spaces. Maybe one of Bertsekas's books? submitted by /u/quadprog [link] [comments]  ( 1 min )
    NEED HELP MAKING A BASIC PYTHON MODEL
    I have a 2 column dataset “Date” “Result”. The Result column produces a 0 or 1 for each date. I need to make a reinforcement model that will predict whether or not the next result will be a 0 or 1. It needs to be done in jupyter notebook . submitted by /u/EffectiveBug4629 [link] [comments]  ( 1 min )
    From machine learning to sequential decision problems (reinforcement learning)
    Any reinforcement learning problem can be modeled as a sequential decision problem (SDP), which can always be modeled as a Markov decision process. An example of an SDP is a multiarmed bandit problem, where the state is the vector of beliefs about the performance of each arm (or beliefs about a continuous parametric model). Decisions are made by a policy, and there are four classes of policies. For some reason, the RL community tends to focus on just one of the four classes (UCB policies, which fall in the class of cost function approximations), but there are entire communities using each of the other three classes. See chapter 7 of my new book for a complete summary of the four classes for pure learning problems (aka bandit problems). See https://tinyurl.com/RLandSO/ Curious why Sutton and Barto (2nd edition) cover bandit problems in chapter 2, and then introduce MDPs in chapter 3. A bandit problem *is* an MDP! submitted by /u/powell-sda [link] [comments]  ( 1 min )
    Policy gradient vs. Policy iteration?
    Hello, I'm currently learning about MDPs and machine learning. I have a few questions that might be trivial or obvious but I can't find many concrete answers online: -Are policy gradient and policy iteration similar/the same? From what I can gather, policy iteration is a type or subset of policy gradient algorithm, is this correct? -Are all policy learning methods less effective for large state spaces? From my understanding you need to use some kind of value function iteration and heursitic function for larger state spaces because you can't encounter all states enough times to converge on an optimal policy -Does convergence on a policy/value function find a local or global optimum? With neural nets, simple backpropagation may only find a local minimum for the cost function, is this true of MDP/RL iteration algorithms? Thanks!! submitted by /u/egad_a_mouse [link] [comments]  ( 2 min )
    How to create a layer without inputs in tensorflow.
    In deep rl algorithm like PPO, a continuous stochastic policy is represented by Normal Distribution. For this the recommended way of creating a Normal Distribution is to get the mean by passing the state through NN and then using a state independent layer to predict log_std. This layer which predicts log_std should be trainable using backprop just like biases. So how to create this layer in tensorflow 2. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
  • Open

    Machine Learning-based Anomaly Detection in Optical Fiber Monitoring. (arXiv:2204.07059v1 [cs.NI])
    Secure and reliable data communication in optical networks is critical for high-speed Internet. However, optical fibers, serving as the data transmission medium providing connectivity to billons of users worldwide, are prone to a variety of anomalies resulting from hard failures (e.g., fiber cuts) and malicious physical attacks (e.g., optical eavesdropping (fiber tapping)) etc. Such anomalies may cause network disruption and thereby inducing huge financial and data losses, or compromise the confidentiality of optical networks by gaining unauthorized access to the carried data, or gradually degrade the network operations. Therefore, it is highly required to implement efficient anomaly detection, diagnosis, and localization schemes for enhancing the availability and reliability of optical networks. In this paper, we propose a data driven approach to accurately and quickly detect, diagnose, and localize fiber anomalies including fiber cuts, and optical eavesdropping attacks. The proposed method combines an autoencoder-based anomaly detection and an attention-based bidirectional gated recurrent unit algorithm, whereby the former is used for fault detection and the latter is adopted for fault diagnosis and localization once an anomaly is detected by the autoencoder. We verify the efficiency of our proposed approach by experiments under various anomaly scenarios using real operational data. The experimental results demonstrate that: (i) the autoencoder detects any fiber fault or anomaly with an F1 score of 96.86%; and (ii) the attention-based bidirectional gated recurrent unit algorithm identifies the the detected anomalies with an average accuracy of 98.2%, and localizes the faults with an average root mean square error of 0.19 m.  ( 2 min )
    Robust No-Regret Learning in Min-Max Stackelberg Games. (arXiv:2203.14126v2 [cs.GT] UPDATED)
    The behavior of no-regret learning algorithms is well understood in two-player min-max (i.e, zero-sum) games. In this paper, we investigate the behavior of no-regret learning in min-max games with dependent strategy sets, where the strategy of the first player constrains the behavior of the second. Such games are best understood as sequential, i.e., min-max Stackelberg, games. We consider two settings, one in which only the first player chooses their actions using a no-regret algorithm while the second player best responds, and one in which both players use no-regret algorithms. For the former case, we show that no-regret dynamics converge to a Stackelberg equilibrium. For the latter case, we introduce a new type of regret, which we call Lagrangian regret, and show that if both players minimize their Lagrangian regrets, then play converges to a Stackelberg equilibrium. We then observe that online mirror descent (OMD) dynamics in these two settings correspond respectively to a known nested (i.e., sequential) gradient descent-ascent (GDA) algorithm and a new simultaneous GDA-like algorithm, thereby establishing convergence of these algorithms to Stackelberg equilibrium. Finally, we analyze the robustness of OMD dynamics to perturbations by investigating online min-max Stackelberg games. We prove that OMD dynamics are robust for a large class of online min-max games with independent strategy sets. In the dependent case, we demonstrate the robustness of OMD dynamics experimentally by simulating them in online Fisher markets, a canonical example of a min-max Stackelberg game with dependent strategy sets.  ( 2 min )
    Open-Set Recognition: a Good Closed-Set Classifier is All You Need?. (arXiv:2110.06207v2 [cs.CV] CROSS LISTED)
    The ability to identify whether or not a test sample belongs to one of the semantic classes in a classifier's training set is critical to practical deployment of the model. This task is termed open-set recognition (OSR) and has received significant attention in recent years. In this paper, we first demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. We find that this relationship holds across loss objectives and architectures, and further demonstrate the trend both on the standard OSR benchmarks as well as on a large-scale ImageNet evaluation. Second, we use this correlation to boost the performance of a maximum logit score OSR 'baseline' by improving its closed-set accuracy, and with this strong baseline achieve state-of-the-art on a number of OSR benchmarks. Similarly, we boost the performance of the existing state-of-the-art method by improving its closed-set accuracy, but the resulting discrepancy with the strong baseline is marginal. Our third contribution is to present the 'Semantic Shift Benchmark' (SSB), which better respects the task of detecting semantic novelty, in contrast to other forms of distribution shift also considered in related sub-fields, such as out-of-distribution detection. On this new evaluation, we again demonstrate that there is negligible difference between the strong baseline and the existing state-of-the-art. Project Page: https://www.robots.ox.ac.uk/~vgg/research/osr/  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    YOLO-Pose: Enhancing YOLO for Multi Person Pose Estimation Using Object Keypoint Similarity Loss. (arXiv:2204.06806v1 [cs.CV])
    We introduce YOLO-pose, a novel heatmap-free approach for joint detection, and 2D multi-person pose estimation in an image based on the popular YOLO object detection framework. Existing heatmap based two-stage approaches are sub-optimal as they are not end-to-end trainable and training relies on a surrogate L1 loss that is not equivalent to maximizing the evaluation metric, i.e. Object Keypoint Similarity (OKS). Our framework allows us to train the model end-to-end and optimize the OKS metric itself. The proposed model learns to jointly detect bounding boxes for multiple persons and their corresponding 2D poses in a single forward pass and thus bringing in the best of both top-down and bottom-up approaches. Proposed approach doesn't require the postprocessing of bottom-up approaches to group detected keypoints into a skeleton as each bounding box has an associated pose, resulting in an inherent grouping of the keypoints. Unlike top-down approaches, multiple forward passes are done away with since all persons are localized along with their pose in a single inference. YOLO-pose achieves new state-of-the-art results on COCO validation (90.2% AP50) and test-dev set (90.3% AP50), surpassing all existing bottom-up approaches in a single forward pass without flip test, multi-scale testing, or any other test time augmentation. All experiments and results reported in this paper are without any test time augmentation, unlike traditional approaches that use flip-test and multi-scale testing to boost performance. Our training codes will be made publicly available at https://github.com/TexasInstruments/edgeai-yolov5 and https://github.com/TexasInstruments/edgeai-yolox  ( 2 min )
    MARF: Multiscale Adaptive-switch Random Forest for Leg Detection with 2D Laser Scanners. (arXiv:2204.06833v1 [cs.RO])
    For the 2D laser-based tasks, e.g., people detection and people tracking, leg detection is usually the first step. Thus, it carries great weight in determining the performance of people detection and people tracking. However, many leg detectors ignore the inevitable noise and the multiscale characteristics of the laser scan, which makes them sensitive to the unreliable features of point cloud and further degrades the performance of the leg detector. In this paper, we propose a multiscale adaptive-switch Random Forest (MARF) to overcome these two challenges. Firstly, the adaptive-switch decision tree is designed to use noisesensitive features to conduct weighted classification and noiseinvariant features to conduct binary classification, which makes our detector perform more robust to noise. Secondly, considering the multiscale property that the sparsity of the 2D point cloud is proportional to the length of laser beams, we design a multiscale random forest structure to detect legs at different distances. Moreover, the proposed approach allows us to discover a sparser human leg from point clouds than others. Consequently, our method shows an improved performance compared to other state-of-the-art leg detectors on the challenging Moving Legs dataset and retains the whole pipeline at a speed of 60+ FPS on lowcomputational laptops. Moreover, we further apply the proposed MARF to the people detection and tracking system, achieving a considerable gain in all metrics.  ( 2 min )
    Not All Patches are What You Need: Expediting Vision Transformers via Token Reorganizations. (arXiv:2202.07800v2 [cs.CV] UPDATED)
    Vision Transformers (ViTs) take all the image patches as tokens and construct multi-head self-attention (MHSA) among them. Complete leverage of these image tokens brings redundant computations since not all the tokens are attentive in MHSA. Examples include that tokens containing semantically meaningless or distractive image backgrounds do not positively contribute to the ViT predictions. In this work, we propose to reorganize image tokens during the feed-forward process of ViT models, which is integrated into ViT during training. For each forward inference, we identify the attentive image tokens between MHSA and FFN (i.e., feed-forward network) modules, which is guided by the corresponding class token attention. Then, we reorganize image tokens by preserving attentive image tokens and fusing inattentive ones to expedite subsequent MHSA and FFN computations. To this end, our method EViT improves ViTs from two perspectives. First, under the same amount of input image tokens, our method reduces MHSA and FFN computation for efficient inference. For instance, the inference speed of DeiT-S is increased by 50% while its recognition accuracy is decreased by only 0.3% for ImageNet classification. Second, by maintaining the same computational cost, our method empowers ViTs to take more image tokens as input for recognition accuracy improvement, where the image tokens are from higher resolution images. An example is that we improve the recognition accuracy of DeiT-S by 1% for ImageNet classification at the same computational cost of a vanilla DeiT-S. Meanwhile, our method does not introduce more parameters to ViTs. Experiments on the standard benchmarks show the effectiveness of our method. The code is available at https://github.com/youweiliang/evit  ( 2 min )
    Learning Task-Aware Energy Disaggregation: a Federated Approach. (arXiv:2204.06767v1 [cs.LG])
    We consider the problem of learning the energy disaggregation signals for residential load data. Such task is referred as non-intrusive load monitoring (NILM), and in order to find individual devices' power consumption profiles based on aggregated meter measurements, a machine learning model is usually trained based on large amount of training data coming from a number of residential homes. Yet collecting such residential load datasets require both huge efforts and customers' approval on sharing metering data, while load data coming from different regions or electricity users may exhibit heterogeneous usage patterns. Both practical concerns make training a single, centralized NILM model challenging. In this paper, we propose a decentralized and task-adaptive learning scheme for NILM tasks, where nested meta learning and federated learning steps are designed for learning task-specific models collectively. Simulation results on benchmark dataset validate proposed algorithm's performance on efficiently inferring appliance-level consumption for a variety of homes and appliances.  ( 2 min )
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    GM-TOuNN: Graded Multiscale Topology Optimization using Neural Networks. (arXiv:2204.06682v1 [cs.CE])
    Multiscale topology optimization (M-TO) entails generating an optimal global topology, and an optimal set of microstructures at a smaller scale, for a physics-constrained problem. With the advent of additive manufacturing, M-TO has gained significant prominence. However, generating optimal microstructures at various locations can be computationally very expensive. As an alternate, graded multiscale topology optimization (GM-TO) has been proposed where one or more pre-selected and graded (parameterized) microstructural topologies are used to fill the domain optimally. This leads to a significant reduction in computation while retaining many of the benefits of M-TO. A successful GM-TO framework must: (1) be capable of efficiently handling numerous pre-selected microstructures, (2) be able to continuously switch between these microstructures during optimization, (3) ensure that the partition of unity is satisfied, and (4) discourage microstructure mixing at termination. In this paper, we propose to meet these requirements by exploiting the unique classification capacity of neural networks. Specifically, we propose a graded multiscale topology optimization using neural-network (GM-TOuNN) framework with the following features: (1) the number of design variables is only weakly dependent on the number of pre-selected microstructures, (2) it guarantees partition of unity while discouraging microstructure mixing, and (3) it supports automatic differentiation, thereby eliminating manual sensitivity analysis. The proposed framework is illustrated through several examples.  ( 2 min )
    Medical Application of Geometric Deep Learning for the Diagnosis of Glaucoma. (arXiv:2204.07004v1 [eess.IV])
    Purpose: (1) To assess the performance of geometric deep learning (PointNet) in diagnosing glaucoma from a single optical coherence tomography (OCT) 3D scan of the optic nerve head (ONH); (2) To compare its performance to that obtained with a standard 3D convolutional neural network (CNN), and with a gold-standard glaucoma parameter, i.e. retinal nerve fiber layer (RNFL) thickness. Methods: 3D raster scans of the ONH were acquired with Spectralis OCT for 477 glaucoma and 2,296 non-glaucoma subjects at the Singapore National Eye Centre. All volumes were automatically segmented using deep learning to identify 7 major neural and connective tissues including the RNFL, the prelamina, and the lamina cribrosa (LC). Each ONH was then represented as a 3D point cloud with 1,000 points chosen randomly from all tissue boundaries. To simplify the problem, all ONH point clouds were aligned with respect to the plane and center of Bruch's membrane opening. Geometric deep learning (PointNet) was then used to provide a glaucoma diagnosis from a single OCT point cloud. The performance of our approach was compared to that obtained with a 3D CNN, and with RNFL thickness. Results: PointNet was able to provide a robust glaucoma diagnosis solely from the ONH represented as a 3D point cloud (AUC=95%). The performance of PointNet was superior to that obtained with a standard 3D CNN (AUC=87%) and with that obtained from RNFL thickness alone (AUC=80%). Discussion: We provide a proof-of-principle for the application of geometric deep learning in the field of glaucoma. Our technique requires significantly less information as input to perform better than a 3D CNN, and with an AUC superior to that obtained from RNFL thickness alone. Geometric deep learning may have wide applicability in the field of Ophthalmology.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Word Embeddings Are Capable of Capturing Rhythmic Similarity of Words. (arXiv:2204.04833v2 [cs.CL] UPDATED)
    Word embedding systems such as Word2Vec and GloVe are well-known in deep learning approaches to NLP. This is largely due to their ability to capture semantic relationships between words. In this work we investigated their usefulness in capturing rhythmic similarity of words instead. The results show that vectors these embeddings assign to rhyming words are more similar to each other, compared to the other words. It is also revealed that GloVe performs relatively better than Word2Vec in this regard. We also proposed a first of its kind metric for quantifying rhythmic similarity of a pair of words.  ( 2 min )
    BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks. (arXiv:2204.07054v1 [q-bio.NC])
    Mapping the connectome of the human brain using structural or functional connectivity has become one of the most pervasive paradigms for neuroimaging analysis. Recently, Graph Neural Networks (GNNs) motivated from geometric deep learning have attracted broad interest due to their established power for modeling complex networked data. Despite their established performance in other fields, there has not yet been a systematic study of how to design effective GNNs for brain network analysis. To bridge this gap, we present BrainGB, a benchmark for brain network analysis with GNNs. BrainGB standardizes the process by 1) summarizing brain network construction pipelines for both functional and structural neuroimaging modalities and 2) modularizing the implementation of GNN designs. We conduct extensive experiments on datasets across cohorts and modalities and recommend a set of general recipes for effective GNN designs on brain networks. To support open and reproducible research on GNN-based brain network analysis, we also host the BrainGB website at https:// brainnet.us/ with models, tutorials, examples, as well as an out-of-box Python package. We hope that this work will provide useful empirical evidence and offer insights for future research in this novel and promising direction.  ( 2 min )
    Accelerated Policy Learning with Parallel Differentiable Simulation. (arXiv:2204.07137v1 [cs.LG])
    Deep reinforcement learning can generate complex control policies, but requires large amounts of training data to work effectively. Recent work has attempted to address this issue by leveraging differentiable simulators. However, inherent problems such as local minima and exploding/vanishing numerical gradients prevent these methods from being generally applied to control tasks with complex contact-rich dynamics, such as humanoid locomotion in classical RL benchmarks. In this work we present a high-performance differentiable simulator and a new policy learning algorithm (SHAC) that can effectively leverage simulation gradients, even in the presence of non-smoothness. Our learning algorithm alleviates problems with local minima through a smooth critic function, avoids vanishing/exploding gradients through a truncated learning window, and allows many physical environments to be run in parallel. We evaluate our method on classical RL control tasks, and show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms. In addition, we demonstrate the scalability of our method by applying it to the challenging high-dimensional problem of muscle-actuated locomotion with a large action space, achieving a greater than 17x reduction in training time over the best-performing established RL algorithm.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Latent Aspect Detection from Online Unsolicited Customer Reviews. (arXiv:2204.06964v1 [cs.CL])
    Within the context of review analytics, aspects are the features of products and services at which customers target their opinions and sentiments. Aspect detection helps product owners and service providers to identify shortcomings and prioritize customers' needs, and hence, maintain revenues and mitigate customer churn. Existing methods focus on detecting the surface form of an aspect by training supervised learning methods that fall short when aspects are latent in reviews. In this paper, we propose an unsupervised method to extract latent occurrences of aspects. Specifically, we assume that a customer undergoes a two-stage hypothetical generative process when writing a review: (1) deciding on an aspect amongst the set of aspects available for the product or service, and (2) writing the opinion words that are more interrelated to the chosen aspect from the set of all words available in a language. We employ latent Dirichlet allocation to learn the latent aspects distributions for generating the reviews. Experimental results on benchmark datasets show that our proposed method is able to improve the state of the art when the aspects are latent with no surface form in reviews.  ( 2 min )
    Neighborhood Attention Transformer. (arXiv:2204.07143v1 [cs.CV])
    We present Neighborhood Attention Transformer (NAT), an efficient, accurate and scalable hierarchical transformer that works well on both image classification and downstream vision tasks. It is built upon Neighborhood Attention (NA), a simple and flexible attention mechanism that localizes the receptive field for each query to its nearest neighboring pixels. NA is a localization of self-attention, and approaches it as the receptive field size increases. It is also equivalent in FLOPs and memory usage to Swin Transformer's shifted window attention given the same receptive field size, while being less constrained. Furthermore, NA includes local inductive biases, which eliminate the need for extra operations such as pixel shifts. Experimental results on NAT are competitive; NAT-Tiny reaches 83.2% top-1 accuracy on ImageNet with only 4.3 GFLOPs and 28M parameters, 51.4% mAP on MS-COCO and 48.4% mIoU on ADE20k. We will open-source our checkpoints, training script, configurations, and our CUDA kernel at: https://github.com/SHI-Labs/Neighborhood-Attention-Transformer .  ( 2 min )
    Fix Bugs with Transformer through a Neural-Symbolic Edit Grammar. (arXiv:2204.06643v1 [cs.LG])
    We introduce NSEdit (neural-symbolic edit), a novel Transformer-based code repair method. Given only the source code that contains bugs, NSEdit predicts an editing sequence that can fix the bugs. The edit grammar is formulated as a regular language, and the Transformer uses it as a neural-symbolic scripting interface to generate editing programs. We modify the Transformer and add a pointer network to select the edit locations. An ensemble of rerankers are trained to re-rank the editing sequences generated by beam search. We fine-tune the rerankers on the validation set to reduce over-fitting. NSEdit is evaluated on various code repair datasets and achieved a new state-of-the-art accuracy ($24.04\%$) on the Tufano small dataset of the CodeXGLUE benchmark. NSEdit performs robustly when programs vary from packages to packages and when buggy programs are concrete. We conduct detailed analysis on our methods and demonstrate the effectiveness of each component.  ( 2 min )
    Reflective Fiber Faults Detection and Characterization Using Long-Short-Term Memory. (arXiv:2204.07058v1 [cs.NI])
    To reduce operation-and-maintenance expenses (OPEX) and to ensure optical network survivability, optical network operators need to detect and diagnose faults in a timely manner and with high accuracy. With the rapid advancement of telemetry technology and data analysis techniques, data-driven approaches leveraging telemetry data to tackle the fault diagnosis problem have been gaining popularity due to their quick implementation and deployment. In this paper, we propose a novel multi-task learning model based on long short-term memory (LSTM) to detect, locate, and estimate the reflectance of fiber reflective faults (events) including the connectors and the mechanical splices by extracting insights from monitored data obtained by the optical time domain reflectometry (OTDR) principle commonly used for troubleshooting of fiber optic cables or links. The experimental results prove that the proposed method: (i) achieves a good detection capability and high localization accuracy within short measurement time even for low SNR values; and (ii) outperforms conventionally employed techniques.
    The Power of Linear Recurrent Neural Networks. (arXiv:1802.03308v6 [cs.LG] UPDATED)
    Recurrent neural networks are a powerful means to cope with time series. We show how linear, i.e., linearly activated recurrent neural networks (LRNNs) can approximate any time-dependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore, the size of an LRNN can be reduced significantly in one step, after inspecting the eigenvalues of the network transition matrix, by taking only the most relevant components. Therefore, in contrast to others, we do not only learn network weights but also the network architecture. LRNNs have interesting properties: They end up in ellipse trajectories in the long run and allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO), robotic soccer, and predicting stock prices. LRNNs outperform the previous state-of-the-art for the MSO task with a minimal number of units.  ( 2 min )
    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.  ( 2 min )
    Interpretability of Machine Learning Methods Applied to Neuroimaging. (arXiv:2204.07005v1 [cs.CV])
    Deep learning methods have become very popular for the processing of natural images, and were then successfully adapted to the neuroimaging field. As these methods are non-transparent, interpretability methods are needed to validate them and ensure their reliability. Indeed, it has been shown that deep learning models may obtain high performance even when using irrelevant features, by exploiting biases in the training set. Such undesirable situations can potentially be detected by using interpretability methods. Recently, many methods have been proposed to interpret neural networks. However, this domain is not mature yet. Machine learning users face two major issues when aiming to interpret their models: which method to choose, and how to assess its reliability? Here, we aim at providing answers to these questions by presenting the most common interpretability methods and metrics developed to assess their reliability, as well as their applications and benchmarks in the neuroimaging context. Note that this is not an exhaustive survey: we aimed to focus on the studies which we found to be the most representative and relevant.  ( 2 min )
    The MIT Supercloud Workload Classification Challenge. (arXiv:2204.05839v2 [cs.DC] UPDATED)
    High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu.  ( 2 min )
    Learning and controlling the source-filter representation of speech with a variational autoencoder. (arXiv:2204.07075v1 [cs.SD])
    Understanding and controlling latent representations in deep generative models is a challenging yet important problem for analyzing, transforming and generating various types of data. In speech processing, inspiring from the anatomical mechanisms of phonation, the source-filter model considers that speech signals are produced from a few independent and physically meaningful continuous latent factors, among which the fundamental frequency $f_0$ and the formants are of primary importance. In this work, we show that the source-filter model of speech production naturally arises in the latent space of a variational autoencoder (VAE) trained in an unsupervised manner on a dataset of natural speech signals. Using only a few seconds of labeled speech signals generated with an artificial speech synthesizer, we experimentally illustrate that $f_0$ and the formant frequencies are encoded in orthogonal subspaces of the VAE latent space and we develop a weakly-supervised method to accurately and independently control these speech factors of variation within the learned latent subspaces. Without requiring additional information such as text or human-labeled data, this results in a deep generative model of speech spectrograms that is conditioned on $f_0$ and the formant frequencies, and which is applied to the transformation of speech signals.  ( 2 min )
    Solving AC Power Flow with Graph Neural Networks under Realistic Constraints. (arXiv:2204.07000v1 [cs.LG])
    In this paper we propose a graph neural network architecture solving the AC power flow problem under realistic constraints. While the energy transition is changing the energy industry to a digitalized and decentralized energy system, the challenges are increasingly shifting to the distribution grid level to integrate new loads and generation technologies. To ensure a save and resilient operation of distribution grids, AC power flow calculations are the means of choice to determine grid operating limits or analyze grid asset utilization in planning procedures. In our approach we demonstrate the development of a framework which makes use of graph neural networks to learn the physical constraints of the power flow. We present our model architecture on which we perform unsupervised training to learn a general solution of the AC power flow formulation that is independent of the specific topologies and supply tasks used for training. Finally, we demonstrate, validate and discuss our results on medium voltage benchmark grids.  ( 2 min )
    Generative power of a protein language model trained on multiple sequence alignments. (arXiv:2204.07110v1 [q-bio.BM])
    Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly uses the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences generally score better than those generated by Potts models, and even than natural sequences, for homology, coevolution and structure-based measures. Moreover, MSA Transformer better reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models, although Potts models better reproduce first- and second-order statistics. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
    Matrix Completion with Heterogonous Cost. (arXiv:2203.12120v2 [cs.LG] UPDATED)
    The matrix completion problem has been studied broadly under many underlying conditions. The problem has been explored under adaptive or non-adaptive, exact or estimation, single-phase or multi-phase, and many other categories. In most of these cases, the observation cost of each entry is uniform and has the same cost across the columns. However, in many real-life scenarios, we could expect elements from distinct columns or distinct positions to have a different cost. In this paper, we explore this generalization under adaptive conditions. We approach the problem under two different cost models. The first one is that entries from different columns have different observation costs, but, within the same column, each entry has a uniform cost. The second one is any two entry has different observation cost, despite being the same or different columns. We provide complexity analysis of our algorithms and provide tightness guarantees.
    Semi-Supervised Convolutive NMF for Automatic Piano Transcription. (arXiv:2202.04989v2 [cs.SD] UPDATED)
    Automatic Music Transcription, which consists in transforming an audio recording of a musical performance into symbolic format, remains a difficult Music Information Retrieval task. In this work, which focuses on piano transcription, we propose a semi-supervised approach using low-rank matrix factorization techniques, in particular Convolutive Nonnegative Matrix Factorization. In the semi-supervised setting, only a single recording of each individual notes is required. We show on the MAPS dataset that the proposed semi-supervised CNMF method performs better than state-of-the-art low-rank factorization techniques and a little worse than supervised deep learning state-of-the-art methods, while however suffering from generalization issues.
    HCFL: A High Compression Approach for Communication-Efficient Federated Learning in Very Large Scale IoT Networks. (arXiv:2204.06760v1 [cs.LG])
    Federated learning (FL) is a new artificial intelligence concept that enables Internet-of-Things (IoT) devices to learn a collaborative model without sending the raw data to centralized nodes for processing. Despite numerous advantages, low computing resources at IoT devices and high communication costs for exchanging model parameters make applications of FL in massive IoT networks very limited. In this work, we develop a novel compression scheme for FL, called high-compression federated learning (HCFL), for very large scale IoT networks. HCFL can reduce the data load for FL processes without changing their structure and hyperparameters. In this way, we not only can significantly reduce communication costs, but also make intensive learning processes more adaptable on low-computing resource IoT devices. Furthermore, we investigate a relationship between the number of IoT devices and the convergence level of the FL model and thereby better assess the quality of the FL process. We demonstrate our HCFL scheme in both simulations and mathematical analyses. Our proposed theoretical research can be used as a minimum level of satisfaction, proving that the FL process can achieve good performance when a determined configuration is met. Therefore, we show that HCFL is applicable in any FL-integrated networks with numerous IoT devices.
    Reinforcement Learning Policy Recommendation for Interbank Network Stability. (arXiv:2204.07134v1 [econ.GN])
    In this paper we analyze the effect of a policy recommendation on the performances of an artificial interbank market. Financial institutions stipulate lending agreements following a public recommendation and their individual information. The former, modeled by a reinforcement learning optimal policy trying to maximize the long term fitness of the system, gathers information on the economic environment and directs economic actors to create credit relationships based on the optimal choice between a low interest rate or high liquidity supply. The latter, based on the agents' balance sheet, allows to determine the liquidity supply and interest rate that the banks optimally offer on the market. Based on the combination between the public and the private signal, financial institutions create or cut their credit connections over time via a preferential attachment evolving procedure able to generate a dynamic network. Our results show that the emergence of a core-periphery interbank network, combined with a certain level of homogeneity on the size of lenders and borrowers, are essential features to ensure the resilience of the system. Moreover, the reinforcement learning optimal policy recommendation plays a crucial role in mitigating systemic risk with respect to alternative policy instruments.
    Constrained Deep One-Class Feature Learning For Classifying Imbalanced Medical Images. (arXiv:2111.10610v2 [eess.IV] UPDATED)
    Medical image data are usually imbalanced across different classes. One-class classification has attracted increasing attention to address the data imbalance problem by distinguishing the samples of the minority class from the majority class. Previous methods generally aim to either learn a new feature space to map training samples together or to fit training samples by autoencoder-like models. These methods mainly focus on capturing either compact or descriptive features, where the information of the samples of a given one class is not sufficiently utilized. In this paper, we propose a novel deep learning-based method to learn compact features by adding constraints on the bottleneck features, and to preserve descriptive features by training an autoencoder at the same time. Through jointly optimizing the constraining loss and the autoencoder's reconstruction loss, our method can learn more relevant features associated with the given class, making the majority and minority samples more distinguishable. Experimental results on three clinical datasets (including the MRI breast images, FFDM breast images and chest X-ray images) obtains state-of-art performance compared to previous methods.
    Supplementation of deep neural networks with simplified physics-based features to increase model prediction accuracy. (arXiv:2204.06764v1 [cs.ET])
    To improve predictive models for STEM applications, supplemental physics-based features computed from input parameters are introduced into single and multiple layers of a deep neural network (DNN). While many studies focus on informing DNNs with physics through differential equations or numerical simulation, much may be gained through integration of simplified relationships. To evaluate this hypothesis, a number of thin rectangular plates simply-supported on all edges are simulated for five materials. With plate dimensions and material properties as input features and fundamental natural frequency as the sole output, predictive performance of a purely data-driven DNN-based model is compared with models using additional inputs computed from simplified physical relationships among baseline parameters, namely plate weight, modulus of rigidity, and shear modulus. To better understand the benefit to model accuracy, these additional features are injected into various single and multiple DNN layers, and trained with four different dataset sizes. When these physics-enhanced models are evaluated against independent data of the same materials and similar dimensions to the training sets, supplementation with simplified physics-based parameters provides little reduction in prediction error over the baseline for models trained with dataset sizes of 60 and greater, although small improvement from 19.3% to 16.1% occurs when trained with a sparse size of 30. Conversely, notable accuracy gains occur when the independent test data is of material and dimensions not conforming to the training set. Specifically, when physics-enhanced data is injected into multiple DNN layers, reductions in error from 33.2% to 19.6%, 34.9% to 19.9%, 35.8% to 22.4%, and 43.0% to 28.4% are achieved for training dataset sizes of 261, 117, 60, and 30, respectively, demonstrating attainment of a degree of generalizability.
    ConDor: Self-Supervised Canonicalization of 3D Pose for Partial Shapes. (arXiv:2201.07788v2 [cs.CV] UPDATED)
    Progress in 3D object understanding has relied on manually canonicalized shape datasets that contain instances with consistent position and orientation (3D pose). This has made it hard to generalize these methods to in-the-wild shapes, eg., from internet model collections or depth sensors. ConDor is a self-supervised method that learns to Canonicalize the 3D orientation and position for full and partial 3D point clouds. We build on top of Tensor Field Networks (TFNs), a class of permutation- and rotation-equivariant, and translation-invariant 3D networks. During inference, our method takes an unseen full or partial 3D point cloud at an arbitrary pose and outputs an equivariant canonical pose. During training, this network uses self-supervision losses to learn the canonical pose from an un-canonicalized collection of full and partial 3D point clouds. ConDor can also learn to consistently co-segment object parts without any supervision. Extensive quantitative results on four new metrics show that our approach outperforms existing methods while enabling new applications such as operation on depth images and annotation transfer.
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.
    Time Series of Non-Additive Metrics: Identification and Interpretation of Contributing Factors of Variance by Linear Decomposition. (arXiv:2204.06688v1 [cs.LG])
    The research paper addresses linear decomposition of time series of non-additive metrics that allows for the identification and interpretation of contributing factors (input features) of variance. Non-additive metrics, such as ratios, are widely used in a variety of domains. It commonly requires preceding aggregations of underlying variables that are used to calculate the metric of interest. The latest poses a dimensionality challenge when the input features and underlying variables are formed as two-dimensional arrays along elements, such as account or customer identifications, and time points. It rules out direct modeling of the time series of a non-additive metric as a function of input features. The article discusses a five-step approach: (1) segmentations of input features and the underlying variables of the metric that are supported by unsupervised autoencoders, (2) univariate or joint fittings of the metric by the aggregated input features on the segmented domains, (3) transformations of pre-screened input features according to the fitted models, (4) aggregation of the transformed features as time series, and (5) modelling of the metric time series as a sum of constrained linear effects of the aggregated features. Alternatively, approximation by numerical differentiation has been considered to linearize the metric. It allows for element level univariate or joint modeling of step (2). The process of these analytical steps allows for a backward-looking explanatory decomposition of the metric as a sum of time series of the survived input features. The paper includes a synthetic example that studies loss-to-balance monthly rates of a hypothetical retail credit portfolio. To validate that no latent factors other than the survived input features have significant impacts on the metric, Statistical Process Control has been introduced for the residual time series.
    End-to-end multi-particle reconstruction in high occupancy imaging calorimeters with graph neural networks. (arXiv:2204.01681v2 [physics.ins-det] UPDATED)
    We present an end-to-end reconstruction algorithm to build particle candidates from detector hits in next-generation granular calorimeters similar to that foreseen for the high-luminosity upgrade of the CMS detector. The algorithm exploits a distance-weighted graph neural network, trained with object condensation, a graph segmentation technique. Through a single-shot approach, the reconstruction task is paired with energy regression. We describe the reconstruction performance in terms of efficiency as well as in terms of energy resolution. In addition, we show the jet reconstruction performance of our method and discuss its inference computational cost. To our knowledge, this work is the first-ever example of single-shot calorimetric reconstruction of ${\cal O}(1000)$ particles in high-luminosity conditions with 200 pileup.
    Machine Learning State-of-the-Art with Uncertainties. (arXiv:2204.05173v2 [cs.LG] UPDATED)
    With the availability of data, hardware, software ecosystem and relevant skill sets, the machine learning community is undergoing a rapid development with new architectures and approaches appearing at high frequency every year. In this article, we conduct an exemplary image classification study in order to demonstrate how confidence intervals around accuracy measurements can greatly enhance the communication of research results as well as impact the reviewing process. In addition, we explore the hallmarks and limitations of this approximation. We discuss the relevance of this approach reflecting on a spotlight publication of ICLR22. A reproducible workflow is made available as an open-source adjoint to this publication. Based on our discussion, we make suggestions for improving the authoring and reviewing process of machine learning articles.
    Characterizing the Fundamental Trade-offs in Learning Invariant Representations. (arXiv:2109.03386v2 [cs.LG] UPDATED)
    Many applications of representation learning, such as privacy-preservation, algorithmic fairness, and domain adaptation, desire explicit control over semantic information being discarded. This goal is formulated as satisfying two objectives: maximizing utility for predicting a target attribute while simultaneously being independent or invariant with respect to a known semantic attribute. Solutions to such problems lead to trade-offs between the two objectives when they are competing with each other. While existing works study bounds on these trade-offs, three questions still remain outstanding: \emph{What are the exact fundamental trade-offs between utility and invariance?}, 2) \emph{What is the optimal dimensionality of the representation?}, and 3) \emph{What are the encoders (mapping data to a representation) that achieve the exact fundamental trade-offs and how can we estimate them from data?} This paper addresses these questions. We adopt a functional analysis perspective and derive closed-form solutions for the global optima of the underlying optimization problems under mild assumptions, which in turn yields closed formulae for the exact trade-offs, optimal representation dimensionality, and the corresponding encoders. We also numerically quantify the trade-offs on representative problems and compare them to those achieved by baseline invariant representation learning algorithms.
    StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2. (arXiv:2112.14683v2 [cs.CV] UPDATED)
    Videos show continuous events, yet most $-$ if not all $-$ video synthesis frameworks treat them discretely in time. In this work, we think of videos of what they should be $-$ time-continuous signals, and extend the paradigm of neural representations to build a continuous-time video generator. For this, we first design continuous motion representations through the lens of positional embeddings. Then, we explore the question of training on very sparse videos and demonstrate that a good generator can be learned by using as few as 2 frames per clip. After that, we rethink the traditional image + video discriminators pair and design a holistic discriminator that aggregates temporal information by simply concatenating frames' features. This decreases the training cost and provides richer learning signal to the generator, making it possible to train directly on 1024$^2$ videos for the first time. We build our model on top of StyleGAN2 and it is just ${\approx}5\%$ more expensive to train at the same resolution while achieving almost the same image quality. Moreover, our latent space features similar properties, enabling spatial manipulations that our method can propagate in time. We can generate arbitrarily long videos at arbitrary high frame rate, while prior work struggles to generate even 64 frames at a fixed rate. Our model is tested on four modern 256$^2$ and one 1024$^2$-resolution video synthesis benchmarks. In terms of sheer metrics, it performs on average ${\approx}30\%$ better than the closest runner-up. Project website: https://universome.github.io.
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
    Epileptic Seizure Risk Assessment by Multi-Channel Imaging of the EEG. (arXiv:2204.07034v1 [eess.SP])
    Refractory epileptic patients can suffer a seizure at any moment. Seizure prediction would substantially improve their lives. In this work, based on scalp EEG and its transformation into images, the likelihood of an epileptic seizure occurring at any moment is computed using an average of the softmax layer output (the likelihood) of a CNN, instead of the output of the classification layer. Results show that by analyzing the likelihood and thresholding it, prediction has higher sensitivity or a lower FPR/h. The best threshold for the likelihood was higher than 50% for 5 patients, and was lower for the remaining 36. However, more testing is needed, especially in new seizures, to better assess the real performance of this method. This work is a proof of concept with a positive outlook.
    Learning Spectral Unions of Partial Deformable 3D Shapes. (arXiv:2104.00514v2 [cs.GR] UPDATED)
    Spectral geometric methods have brought revolutionary changes to the field of geometry processing. Of particular interest is the study of the Laplacian spectrum as a compact, isometry and permutation-invariant representation of a shape. Some recent works show how the intrinsic geometry of a full shape can be recovered from its spectrum, but there are approaches that consider the more challenging problem of recovering the geometry from the spectral information of partial shapes. In this paper, we propose a possible way to fill this gap. We introduce a learning-based method to estimate the Laplacian spectrum of the union of partial non-rigid 3D shapes, without actually computing the 3D geometry of the union or any correspondence between those partial shapes. We do so by operating purely in the spectral domain and by defining the union operation between short sequences of eigenvalues. We show that the approximated union spectrum can be used as-is to reconstruct the complete geometry [MRC*19], perform region localization on a template [RTO*19] and retrieve shapes from a database, generalizing ShapeDNA [RWP06] to work with partialities. Working with eigenvalues allows us to deal with unknown correspondence, different sampling, and different discretizations (point clouds and meshes alike), making this operation especially robust and general. Our approach is data-driven and can generalize to isometric and non-isometric deformations of the surface, as long as these stay within the same semantic class (e.g., human bodies or horses), as well as to partiality artifacts not seen at training time.
    A Neural Network based Framework for Effective Laparoscopic Video Quality Assessment. (arXiv:2202.04517v2 [eess.IV] UPDATED)
    Video quality assessment is a challenging problem having a critical significance in the context of medical imaging. For instance, in laparoscopic surgery, the acquired video data suffers from different kinds of distortion that not only hinder surgery performance but also affect the execution of subsequent tasks in surgical navigation and robotic surgeries. For this reason, we propose in this paper neural network-based approaches for distortion classification as well as quality prediction. More precisely, a Residual Network (ResNet) based approach is firstly developed for simultaneous ranking and classification task. Then, this architecture is extended to make it appropriate for the quality prediction task by using an additional Fully Connected Neural Network (FCNN). To train the overall architecture (ResNet and FCNN models), transfer learning and end-to-end learning approaches are investigated. Experimental results, carried out on a new laparoscopic video quality database, have shown the efficiency of the proposed methods compared to recent conventional and deep learning based approaches.
    SemiMultiPose: A Semi-supervised Multi-animal Pose Estimation Framework. (arXiv:2204.07072v1 [cs.CV])
    Multi-animal pose estimation is essential for studying animals' social behaviors in neuroscience and neuroethology. Advanced approaches have been proposed to support multi-animal estimation and achieve state-of-the-art performance. However, these models rarely exploit unlabeled data during training even though real world applications have exponentially more unlabeled frames than labeled frames. Manually adding dense annotations for a large number of images or videos is costly and labor-intensive, especially for multiple instances. Given these deficiencies, we propose a novel semi-supervised architecture for multi-animal pose estimation, leveraging the abundant structures pervasive in unlabeled frames in behavior videos to enhance training, which is critical for sparsely-labeled problems. The resulting algorithm will provide superior multi-animal pose estimation results on three animal experiments compared to the state-of-the-art baseline and exhibits more predictive power in sparsely-labeled data regimes.
    Transformers and the representation of biomedical background knowledge. (arXiv:2202.02432v2 [cs.CL] UPDATED)
    BioBERT and BioMegatron are Transformers models adapted for the biomedical domain based on publicly available biomedical corpora. As such, they have the potential to encode large-scale biological knowledge. We investigate the encoding and representation of biological knowledge in these models, and its potential utility to support inference in cancer precision medicine - namely, the interpretation of the clinical significance of genomic alterations. We compare the performance of different transformer baselines; we use probing to determine the consistency of encodings for distinct entities; and we use clustering methods to compare and contrast the internal properties of the embeddings for genes, variants, drugs and diseases. We show that these models do indeed encode biological knowledge, although some of this is lost in fine-tuning for specific tasks. Finally, we analyse how the models behave with regard to biases and imbalances in the dataset.
    Q-TART: Quickly Training for Adversarial Robustness and in-Transferability. (arXiv:2204.07024v1 [cs.CV])
    Raw deep neural network (DNN) performance is not enough; in real-world settings, computational load, training efficiency and adversarial security are just as or even more important. We propose to simultaneously tackle Performance, Efficiency, and Robustness, using our proposed algorithm Q-TART, Quickly Train for Adversarial Robustness and in-Transferability. Q-TART follows the intuition that samples highly susceptible to noise strongly affect the decision boundaries learned by DNNs, which in turn degrades their performance and adversarial susceptibility. By identifying and removing such samples, we demonstrate improved performance and adversarial robustness while using only a subset of the training data. Through our experiments we highlight Q-TART's high performance across multiple Dataset-DNN combinations, including ImageNet, and provide insights into the complementary behavior of Q-TART alongside existing adversarial training approaches to increase robustness by over 1.3% while using up to 17.9% less training time.
    HASA: Hybrid Architecture Search with Aggregation Strategy for Echinococcosis Classification and Ovary Segmentation in Ultrasound Images. (arXiv:2204.06697v1 [cs.CV])
    Different from handcrafted features, deep neural networks can automatically learn task-specific features from data. Due to this data-driven nature, they have achieved remarkable success in various areas. However, manual design and selection of suitable network architectures are time-consuming and require substantial effort of human experts. To address this problem, researchers have proposed neural architecture search (NAS) algorithms which can automatically generate network architectures but suffer from heavy computational cost and instability if searching from scratch. In this paper, we propose a hybrid NAS framework for ultrasound (US) image classification and segmentation. The hybrid framework consists of a pre-trained backbone and several searched cells (i.e., network building blocks), which takes advantage of the strengths of both NAS and the expert knowledge from existing convolutional neural networks. Specifically, two effective and lightweight operations, a mixed depth-wise convolution operator and a squeeze-and-excitation block, are introduced into the candidate operations to enhance the variety and capacity of the searched cells. These two operations not only decrease model parameters but also boost network performance. Moreover, we propose a re-aggregation strategy for the searched cells, aiming to further improve the performance for different vision tasks. We tested our method on two large US image datasets, including a 9-class echinococcosis dataset containing 9566 images for classification and an ovary dataset containing 3204 images for segmentation. Ablation experiments and comparison with other handcrafted or automatically searched architectures demonstrate that our method can generate more powerful and lightweight models for the above US image classification and segmentation tasks.
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.
    Geometric Deep Learning to Identify the Critical 3D Structural Features of the Optic Nerve Head for Glaucoma Diagnosis. (arXiv:2204.06931v1 [eess.IV])
    Purpose: The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma. Optical coherence tomography (OCT) is the current gold standard to visualize and quantify these changes, however the resulting 3D deep-tissue information has not yet been fully exploited for the diagnosis and prognosis of glaucoma. To this end, we aimed: (1) To compare the performance of two relatively recent geometric deep learning techniques in diagnosing glaucoma from a single OCT scan of the ONH; and (2) To identify the 3D structural features of the ONH that are critical for the diagnosis of glaucoma. Methods: In this study, we included a total of 2,247 non-glaucoma and 2,259 glaucoma scans from 1,725 subjects. All subjects had their ONHs imaged in 3D with Spectralis OCT. All OCT scans were automatically segmented using deep learning to identify major neural and connective tissues. Each ONH was then represented as a 3D point cloud. We used PointNet and dynamic graph convolutional neural network (DGCNN) to diagnose glaucoma from such 3D ONH point clouds and to identify the critical 3D structural features of the ONH for glaucoma diagnosis. Results: Both the DGCNN (AUC: 0.97$\pm$0.01) and PointNet (AUC: 0.95$\pm$0.02) were able to accurately detect glaucoma from 3D ONH point clouds. The critical points formed an hourglass pattern with most of them located in the inferior and superior quadrant of the ONH. Discussion: The diagnostic accuracy of both geometric deep learning approaches was excellent. Moreover, we were able to identify the critical 3D structural features of the ONH for glaucoma diagnosis that tremendously improved the transparency and interpretability of our method. Consequently, our approach may have strong potential to be used in clinical applications for the diagnosis and prognosis of a wide range of ophthalmic disorders.
    Integration of neural network and fuzzy logic decision making compared with bilayered neural network in the simulation of daily dew point temperature. (arXiv:2202.12256v2 [cs.LG] UPDATED)
    In this research, dew point temperature (DPT) is simulated using the data-driven approach. Adaptive Neuro-Fuzzy Inference System (ANFIS) is utilized as a data-driven technique to forecast this parameter at Tabriz in East Azerbaijan. Various input patterns, namely T min, T max, and T mean, are utilized for training the architecture whilst DPT is the model's output. The findings indicate that, in general, ANFIS method is capable of identifying data patterns with a high degree of accuracy. However, the approach demonstrates that processing time and computer resources may substantially increase by adding additional functions. Based on the results, the number of iterations and computing resources might change dramatically if new functionalities are included. As a result, tuning parameters have to be optimized inside the method framework. The findings demonstrate a high agreement between results by the data-driven technique (machine learning method) and the observed data. Using this prediction toolkit, DPT can be adequately forecasted solely based on the temperature distribution of Tabriz. This kind of modeling is extremely promising for predicting DPT at various sites. Besides, this study thoroughly compares the Bilayered Neural Network (BNN) and ANFIS models on various scales. Whilst the ANFIS model is extremely stable for almost all numbers of membership functions, the BNN model is highly sensitive to this scale factor to predict DPT.
    Improving Top-K Decoding for Non-Autoregressive Semantic Parsing via Intent Conditioning. (arXiv:2204.06748v1 [cs.CL])
    Semantic parsing (SP) is a core component of modern virtual assistants like Google Assistant and Amazon Alexa. While sequence-to-sequence-based auto-regressive (AR) approaches are common for conversational semantic parsing, recent studies employ non-autoregressive (NAR) decoders and reduce inference latency while maintaining competitive parsing quality. However, a major drawback of NAR decoders is the difficulty of generating top-k (i.e., k-best) outputs with approaches such as beam search. To address this challenge, we propose a novel NAR semantic parser that introduces intent conditioning on the decoder. Inspired by the traditional intent and slot tagging parsers, we decouple the top-level intent prediction from the rest of a parse. As the top-level intent largely governs the syntax and semantics of a parse, the intent conditioning allows the model to better control beam search and improves the quality and diversity of top-k outputs. We introduce a hybrid teacher-forcing approach to avoid training and inference mismatch. We evaluate the proposed NAR on conversational SP datasets, TOP & TOPv2. Like the existing NAR models, we maintain the O(1) decoding time complexity while generating more diverse outputs and improving the top-3 exact match (EM) by 2.4 points. In comparison with AR models, our model speeds up beam search inference by 6.7 times on CPU with competitive top-k EM.
    A Collection of Deep Learning-based Feature-Free Approaches for Characterizing Single-Objective Continuous Fitness Landscapes. (arXiv:2204.05752v2 [cs.LG] UPDATED)
    Exploratory Landscape Analysis is a powerful technique for numerically characterizing landscapes of single-objective continuous optimization problems. Landscape insights are crucial both for problem understanding as well as for assessing benchmark set diversity and composition. Despite the irrefutable usefulness of these features, they suffer from their own ailments and downsides. Hence, in this work we provide a collection of different approaches to characterize optimization landscapes. Similar to conventional landscape features, we require a small initial sample. However, instead of computing features based on that sample, we develop alternative representations of the original sample. These range from point clouds to 2D images and, therefore, are entirely feature-free. We demonstrate and validate our devised methods on the BBOB testbed and predict, with the help of Deep Learning, the high-level, expert-based landscape properties such as the degree of multimodality and the existence of funnel structures. The quality of our approaches is on par with methods relying on the traditional landscape features. Thereby, we provide an exciting new perspective on every research area which utilizes problem information such as problem understanding and algorithm design as well as automated algorithm configuration and selection.
    Proceedings of TDA: Applications of Topological Data Analysis to Data Science, Artificial Intelligence, and Machine Learning Workshop at SDM 2022. (arXiv:2204.01142v2 [math.AT] UPDATED)
    Topological Data Analysis (TDA) is a rigorous framework that borrows techniques from geometric and algebraic topology, category theory, and combinatorics in order to study the "shape" of such complex high-dimensional data. Research in this area has grown significantly over the last several years bringing a deeply rooted theory to bear on practical applications in areas such as genomics, natural language processing, medicine, cybersecurity, energy, and climate change. Within some of these areas, TDA has also been used to augment AI and ML techniques. We believe there is further utility to be gained in this space that can be facilitated by a workshop bringing together experts (both theorists and practitioners) and non-experts. Currently there is an active community of pure mathematicians with research interests in developing and exploring the theoretical and computational aspects of TDA. Applied mathematicians and other practitioners are also present in community but do not represent a majority. This speaks to the primary aim of this workshop which is to grow a wider community of interest in TDA. By fostering meaningful exchanges between these groups, from across the government, academia, and industry, we hope to create new synergies that can only come through building a mutual comprehensive awareness of the problem and solution spaces.
    ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision. (arXiv:2204.06863v1 [cs.LG])
    A way to overcome expensive and time-consuming manual data labeling is weak supervision - automatic annotation of data samples via a predefined set of labeling functions (LFs), rule-based mechanisms that generate potentially erroneous labels. In this work, we investigate noise reduction techniques for weak supervision based on the principle of k-fold cross-validation. In particular, we extend two frameworks for detecting the erroneous samples in manually annotated data to the weakly supervised setting. Our methods profit from leveraging the information about matching LFs and detect noisy samples more accurately. We also introduce a new algorithm for denoising the weakly annotated data called ULF, that refines the allocation of LFs to classes by estimating the reliable LFs-to-classes joint matrix. Evaluation on several datasets shows that ULF successfully improves weakly supervised learning without using any manually labeled data.
    A Melody-Unsupervision Model for Singing Voice Synthesis. (arXiv:2110.06546v2 [eess.AS] UPDATED)
    Recent studies in singing voice synthesis have achieved high-quality results leveraging advances in text-to-speech models based on deep neural networks. One of the main issues in training singing voice synthesis models is that they require melody and lyric labels to be temporally aligned with audio data. The temporal alignment is a time-exhausting manual work in preparing for the training data. To address the issue, we propose a melody-unsupervision model that requires only audio-and-lyrics pairs without temporal alignment in training time but generates singing voice audio given a melody and lyrics input in inference time. The proposed model is composed of a phoneme classifier and a singing voice generator jointly trained in an end-to-end manner. The model can be fine-tuned by adjusting the amount of supervision with temporally aligned melody labels. Through experiments in melody-unsupervision and semi-supervision settings, we compare the audio quality of synthesized singing voice. We also show that the proposed model is capable of being trained with speech audio and text labels but can generate singing voice in inference time.
    Unsupervised Temporal Learning on Monocular Videos for 3D Human Pose Estimation. (arXiv:2012.01511v3 [cs.CV] UPDATED)
    In this paper we propose an unsupervised learning method to extract temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised (CSS) learning to extract rich latent vectors. Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent vector into a time-variant component and a time-invariant one. We then show that applying CSS only to the time-variant features and encouraging a gradual transition on them between nearby and away frames while also reconstructing the input, extract rich temporal features into the time-variant component, well-suited for human pose estimation. Our approach reduces error by about 50\% compared to the standard CSS strategies, outperforms other unsupervised single-view methods and matches the performance of multi-view techniques.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    A Simple and Efficient Sampling-based Algorithm for General Reachability Analysis. (arXiv:2112.05745v3 [eess.SY] UPDATED)
    In this work, we analyze an efficient sampling-based algorithm for general-purpose reachability analysis, which remains a notoriously challenging problem with applications ranging from neural network verification to safety analysis of dynamical systems. By sampling inputs, evaluating their images in the true reachable set, and taking their $\epsilon$-padded convex hull as a set estimator, this algorithm applies to general problem settings and is simple to implement. Our main contribution is the derivation of asymptotic and finite-sample accuracy guarantees using random set theory. This analysis informs algorithmic design to obtain an $\epsilon$-close reachable set approximation with high probability, provides insights into which reachability problems are most challenging, and motivates safety-critical applications of the technique. On a neural network verification task, we show that this approach is more accurate and significantly faster than prior work. Informed by our analysis, we also design a robust model predictive controller that we demonstrate in hardware experiments.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    PEg TRAnsfer Workflow recognition challenge report: Does multi-modal data improve recognition?. (arXiv:2202.05821v2 [cs.LG] UPDATED)
    This paper presents the design and results of the "PEg TRAnsfert Workflow recognition" (PETRAW) challenge whose objective was to develop surgical workflow recognition methods based on one or several modalities, among video, kinematic, and segmentation data, in order to study their added value. The PETRAW challenge provided a data set of 150 peg transfer sequences performed on a virtual simulator. This data set was composed of videos, kinematics, semantic segmentation, and workflow annotations which described the sequences at three different granularity levels: phase, step, and activity. Five tasks were proposed to the participants: three of them were related to the recognition of all granularities with one of the available modalities, while the others addressed the recognition with a combination of modalities. Average application-dependent balanced accuracy (AD-Accuracy) was used as evaluation metric to take unbalanced classes into account and because it is more clinically relevant than a frame-by-frame score. Seven teams participated in at least one task and four of them in all tasks. Best results are obtained with the use of the video and the kinematics data with an AD-Accuracy between 93% and 90% for the four teams who participated in all tasks. The improvement between video/kinematic-based methods and the uni-modality ones was significant for all of the teams. However, the difference in testing execution time between the video/kinematic-based and the kinematic-based methods has to be taken into consideration. Is it relevant to spend 20 to 200 times more computing time for less than 3% of improvement? The PETRAW data set is publicly available at www.synapse.org/PETRAW to encourage further research in surgical workflow recognition.
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.
    Adversarial Parameter Defense by Multi-Step Risk Minimization. (arXiv:2109.02889v2 [cs.LG] UPDATED)
    Previous studies demonstrate DNNs' vulnerability to adversarial examples and adversarial training can establish a defense to adversarial examples. In addition, recent studies show that deep neural networks also exhibit vulnerability to parameter corruptions. The vulnerability of model parameters is of crucial value to the study of model robustness and generalization. In this work, we introduce the concept of parameter corruption and propose to leverage the loss change indicators for measuring the flatness of the loss basin and the parameter robustness of neural network parameters. On such basis, we analyze parameter corruptions and propose the multi-step adversarial corruption algorithm. To enhance neural networks, we propose the adversarial parameter defense algorithm that minimizes the average risk of multiple adversarial parameter corruptions. Experimental results show that the proposed algorithm can improve both the parameter robustness and accuracy of neural networks.
    The Pseudo Projection Operator: Applications of Deep Learning to Projection Based Filtering in Non-Trivial Frequency Regimes. (arXiv:2111.07140v3 [eess.SP] UPDATED)
    Traditional frequency based projection filters, or projection operators (PO), separate signal and noise through a series of transformations which remove frequencies where noise is present. However, this technique relies on a priori knowledge of what frequencies contain signal and noise and that these frequencies do not overlap, which is difficult to achieve in practice. To address these issues, we introduce a PO-neural network hybrid model, the Pseudo Projection Operator (PPO), which leverages a neural network to perform frequency selection. We compare the filtering capabilities of a PPO, PO, and denoising autoencoder (DAE) on the University of Rochester Multi-Modal Music Performance Dataset with a variety of added noise types. In the majority of experiments, the PPO outperforms both the PO and DAE. Based upon these results, we suggest future application of the PPO to filtering problems in the physical and biological sciences.
    Planting Undetectable Backdoors in Machine Learning Models. (arXiv:2204.06974v1 [cs.LG])
    Given the computational cost and technical expertise required to train machine learning models, users may delegate the task of learning to a service provider. We show how a malicious learner can plant an undetectable backdoor into a classifier. On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. First, we show how to plant a backdoor in any model, using digital signature schemes. The construction guarantees that given black-box access to the original model and the backdoored version, it is computationally infeasible to find even a single input where they differ. This property implies that the backdoored model has generalization error comparable with the original model. Second, we demonstrate how to insert undetectable backdoors in models trained using the Random Fourier Features (RFF) learning paradigm or in Random ReLU networks. In this construction, undetectability holds against powerful white-box distinguishers: given a complete description of the network and the training data, no efficient distinguisher can guess whether the model is "clean" or contains a backdoor. Our construction of undetectable backdoors also sheds light on the related issue of robustness to adversarial examples. In particular, our construction can produce a classifier that is indistinguishable from an "adversarially robust" classifier, but where every input has an adversarial example! In summary, the existence of undetectable backdoors represent a significant theoretical roadblock to certifying adversarial robustness.
    Your fairness may vary: Pretrained language model fairness in toxic text classification. (arXiv:2108.01250v3 [cs.CL] UPDATED)
    The popularity of pretrained language models in natural language processing systems calls for a careful evaluation of such models in down-stream tasks, which have a higher potential for societal impact. The evaluation of such systems usually focuses on accuracy measures. Our findings in this paper call for attention to be paid to fairness measures as well. Through the analysis of more than a dozen pretrained language models of varying sizes on two toxic text classification tasks (English), we demonstrate that focusing on accuracy measures alone can lead to models with wide variation in fairness characteristics. Specifically, we observe that fairness can vary even more than accuracy with increasing training data size and different random initializations. At the same time, we find that little of the fairness variation is explained by model size, despite claims in the literature. To improve model fairness without retraining, we show that two post-processing methods developed for structured, tabular data can be successfully applied to a range of pretrained language models. Warning: This paper contains samples of offensive text.
    Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning. (arXiv:2106.06047v2 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as the lack of convergence and the potential for catastrophic forgetting across real-world heterogeneous devices. In this paper, we demonstrate that self-attention-based architectures (e.g., Transformers) are more robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We release our code and pretrained models at https://github.com/Liangqiong/ViT-FL-main to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.
    LDPC codes: tracking non-stationary channel noise using sequential variational Bayesian estimates. (arXiv:2204.07037v1 [eess.SP])
    We present a sequential Bayesian learning method for tracking non-stationary signal-to-noise ratios in LDPC codes using probabilistic graphical models. We represent the LDPC code as a cluster graph using a general purpose cluster graph construction algorithm called the layered trees running intersection property (LTRIP) algorithm. The channel noise estimator is a global Gamma cluster, which we extend to allow for Bayesian tracking of non-stationary noise variation. We evaluate our proposed model on real-world 5G drive test data. Our results show that our model is capable of tracking non-stationary channel noise, which outperforms an LDPC code with a fixed knowledge of the actual average channel noise.
    Exploring Dual Encoder Architectures for Question Answering. (arXiv:2204.07120v1 [cs.CL])
    Dual encoders have been used for question-answering (QA) and information retrieval (IR) tasks with good results. There are two major types of dual encoders, Siamese Dual Encoders (SDE), with parameters shared across two encoders, and Asymmetric Dual Encoder (ADE), with two distinctly parameterized encoders. In this work, we explore the dual encoder architectures for QA retrieval tasks. By evaluating on MS MARCO and the MultiReQA benchmark, we show that SDE performs significantly better than ADE. We further propose three different improved versions of ADEs. Based on the evaluation of QA retrieval tasks and direct analysis of the embeddings, we demonstrate that sharing parameters in projection layers would enable ADEs to perform competitively with SDEs.
    Character-focused Video Thumbnail Retrieval. (arXiv:2204.06563v1 [cs.CV])
    We explore retrieving character-focused video frames as candidates for being video thumbnails. To evaluate each frame of the video based on the character(s) present in it, characters (faces) are evaluated in two aspects: Facial-expression: We train a CNN model to measure whether a face has an acceptable facial expression for being in a video thumbnail. This model is trained to distinguish faces extracted from artworks/thumbnails, from faces extracted from random frames of videos. Prominence and interactions: Character(s) in the thumbnail should be important character(s) in the video, to prevent the algorithm from suggesting non-representative frames as candidates. We use face clustering to identify the characters in the video, and form a graph in which the prominence (frequency of appearance) of the character(s), and their interactions (co-occurrence) are captured. We use this graph to infer the relevance of the characters present in each candidate frame. Once every face is scored based on the two criteria above, we infer frame level scores by combining the scores for all the faces within a frame.
    CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing. (arXiv:2204.06625v1 [cs.CL])
    Model ensemble is a popular approach to produce a low-variance and well-generalized model. However, it induces large memory and inference costs, which are often not affordable for real-world deployment. Existing work has resorted to sharing weights among models. However, when increasing the proportion of the shared weights, the resulting models tend to be similar, and the benefits of using model ensemble diminish. To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO. Specifically, we share the weights of bottom layers across all models and apply different perturbations to the hidden representations for different models, which can effectively promote the model diversity. Meanwhile, we apply a prediction consistency regularizer across the perturbed models to control the variance due to the model diversity. Our experiments using large language models demonstrate that CAMERO significantly improves the generalization performance of the ensemble model. Specifically, CAMERO outperforms the standard ensemble of 8 BERT-base models on the GLUE benchmark by 0.7 with a significantly smaller model size (114.2M vs. 880.6M).
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.
    A Unified Analysis of Dynamic Interactive Learning. (arXiv:2204.07071v1 [cs.LG])
    In this paper we investigate the problem of learning evolving concepts over a combinatorial structure. Previous work by Emamjomeh-Zadeh et al. [2020] introduced dynamics into interactive learning as a way to model non-static user preferences in clustering problems or recommender systems. We provide many useful contributions to this problem. First, we give a framework that captures both of the models analyzed by [Emamjomeh-Zadeh et al., 2020], which allows us to study any type of concept evolution and matches the same query complexity bounds and running time guarantees of the previous models. Using this general model we solve the open problem of closing the gap between the upper and lower bounds on query complexity. Finally, we study an efficient algorithm where the learner simply follows the feedback at each round, and we provide mistake bounds for low diameter graphs such as cliques, stars, and general o(log n) diameter graphs by using a Markov Chain model.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Sketching Algorithms and Lower Bounds for Ridge Regression. (arXiv:2204.06653v1 [cs.DS])
    We give a sketching-based iterative algorithm that computes $1+\varepsilon$ approximate solutions for the ridge regression problem $\min_x \|{Ax-b}\|_2^2 +\lambda\|{x}\|_2^2$ where $A \in \mathbb{R}^{n \times d}$ with $d \ge n$. Our algorithm, for a constant number of iterations (requiring a constant number of passes over the input), improves upon earlier work of Chowdhury et al., by requiring that the sketching matrix only has a weaker Approximate Matrix Multiplication (AMM) guarantee that depends on $\epsilon$, along with a constant subspace embedding guarantee. The earlier work instead requires that the sketching matrix have a subspace embedding guarantee that depends on $\epsilon$. For example, to produce a $1+\varepsilon$ approximate solution in $1$ iteration, which requires $2$ passes over the input, our algorithm requires the OSNAP embedding to have $m= O(n\sigma^2/\lambda\varepsilon)$ rows with a sparsity parameter $s = O(\log(n))$, whereas the earlier algorithm of Chowdhury et al., with the same number of rows of OSNAP requires a sparsity $s = O(\sqrt{\sigma^2/\lambda\varepsilon} \cdot \log(n))$, where $\sigma = \|{A}\|_2$ is the spectral norm of the matrix $A$. We also show that this algorithm can be used to give faster algorithms for kernel ridge regression. Finally, we show that the sketch size required for our algorithm is essentially optimal for a natural framework of algorithms for ridge regression by proving lower bounds on oblivious sketching matrices for AMM. The sketch size lower bounds for AMM may be of independent interest.
    Activation Regression for Continuous Domain Generalization with Applications to Crop Classification. (arXiv:2204.07030v1 [cs.CV])
    Geographic variance in satellite imagery impacts the ability of machine learning models to generalise to new regions. In this paper, we model geographic generalisation in medium resolution Landsat-8 satellite imagery as a continuous domain adaptation problem, demonstrating how models generalise better with appropriate domain knowledge. We develop a dataset spatially distributed across the entire continental United States, providing macroscopic insight into the effects of geography on crop classification in multi-spectral and temporally distributed satellite imagery. Our method demonstrates improved generalisability from 1) passing geographically correlated climate variables along with the satellite data to a Transformer model and 2) regressing on the model features to reconstruct these domain variables. Combined, we provide a novel perspective on geographic generalisation in satellite imagery and a simple-yet-effective approach to leverage domain knowledge. Code is available at: \url{https://github.com/samar-khanna/cropmap}
    Any-resolution Training for High-resolution Image Synthesis. (arXiv:2204.07156v1 [cs.CV])
    Generative models operate at fixed resolution, even though natural images come in a variety of sizes. As high-resolution details are downsampled away, and low-resolution images are discarded altogether, precious supervision is lost. We argue that every pixel matters and create datasets with variable-size images, collected at their native resolutions. Taking advantage of this data is challenging; high-resolution processing is costly, and current architectures can only process fixed-resolution data. We introduce continuous-scale training, a process that samples patches at random scales to train a new generator with variable output resolutions. First, conditioning the generator on a target scale allows us to generate higher resolutions images than previously possible, without adding layers to the model. Second, by conditioning on continuous coordinates, we can sample patches that still obey a consistent global layout, which also allows for scalable training at higher resolutions. Controlled FFHQ experiments show our method takes advantage of the multi-resolution training data better than discrete multi-scale approaches, achieving better FID scores and cleaner high-frequency details. We also train on other natural image domains including churches, mountains, and birds, and demonstrate arbitrary scale synthesis with both coherent global layouts and realistic local details, going beyond 2K resolution in our experiments. Our project page is available at: https://chail.github.io/anyres-gan/.
    Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model. (arXiv:2108.07467v2 [cs.SD] UPDATED)
    Abdominal auscultation is a convenient, safe and inexpensive method to assess bowel conditions, which is essential in neonatal care. It helps early detection of neonatal bowel dysfunctions and allows timely intervention. This paper presents a neonatal bowel sound detection method to assist the auscultation. Specifically, a Convolutional Neural Network (CNN) is proposed to classify peristalsis and non-peristalsis sounds. The classification is then optimized using a Laplace Hidden Semi-Markov Model (HSMM). The proposed method is validated on abdominal sounds from 49 newborn infants admitted to our tertiary Neonatal Intensive Care Unit (NICU). The results show that the method can effectively detect bowel sounds with accuracy and area under curve (AUC) score being 89.81% and 83.96% respectively, outperforming 13 baseline methods. Furthermore, the proposed Laplace HSMM refinement strategy is proven capable to enhance other bowel sound detection models. The outcomes of this work have the potential to facilitate future telehealth applications for neonatal care. The source code of our work can be found at: https://bitbucket.org/chirudeakin/neonatal-bowel-sound-classification/
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.
    Masked Siamese Networks for Label-Efficient Learning. (arXiv:2204.07141v1 [cs.LG])
    We propose Masked Siamese Networks (MSN), a self-supervised learning framework for learning image representations. Our approach matches the representation of an image view containing randomly masked patches to the representation of the original unmasked image. This self-supervised pre-training strategy is particularly scalable when applied to Vision Transformers since only the unmasked patches are processed by the network. As a result, MSNs improve the scalability of joint-embedding architectures, while producing representations of a high semantic level that perform competitively on low-shot image classification. For instance, on ImageNet-1K, with only 5,000 annotated images, our base MSN model achieves 72.4% top-1 accuracy, and with 1% of ImageNet-1K labels, we achieve 75.7% top-1 accuracy, setting a new state-of-the-art for self-supervised learning on this benchmark. Our code is publicly available.
    Activation Map Adaptation for Effective Knowledge Distillation. (arXiv:2010.13500v2 [cs.CV] UPDATED)
    Model compression becomes a recent trend due to the requirement of deploying neural networks on embedded and mobile devices. Hence, both accuracy and efficiency are of critical importance. To explore a balance between them, a knowledge distillation strategy is proposed for general visual representation learning. It utilizes our well-designed activation map adaptive module to replace some blocks of the teacher network, exploring the most appropriate supervisory features adaptively during the training process. Using the teacher's hidden layer output to prompt the student network to train so as to transfer effective semantic information.To verify the effectiveness of our strategy, this paper applied our method to cifar-10 dataset. Results demonstrate that the method can boost the accuracy of the student network by 0.6% with 6.5% loss reduction, and significantly improve its training speed.
    Shedding New Light on the Language of the Dark Web. (arXiv:2204.06885v1 [cs.CL])
    The hidden nature and the limited accessibility of the Dark Web, combined with the lack of public datasets in this domain, make it difficult to study its inherent characteristics such as linguistic properties. Previous works on text classification of Dark Web domain have suggested that the use of deep neural models may be ineffective, potentially due to the linguistic differences between the Dark and Surface Webs. However, not much work has been done to uncover the linguistic characteristics of the Dark Web. This paper introduces CoDA, a publicly available Dark Web dataset consisting of 10000 web documents tailored towards text-based Dark Web analysis. By leveraging CoDA, we conduct a thorough linguistic analysis of the Dark Web and examine the textual differences between the Dark Web and the Surface Web. We also assess the performance of various methods of Dark Web page classification. Finally, we compare CoDA with an existing public Dark Web dataset and evaluate their suitability for various use cases.
    Estimating Structural Disparities for Face Models. (arXiv:2204.06562v1 [cs.CV])
    In machine learning, disparity metrics are often defined by measuring the difference in the performance or outcome of a model, across different sub-populations (groups) of datapoints. Thus, the inputs to disparity quantification consist of a model's predictions $\hat{y}$, the ground-truth labels for the predictions $y$, and group labels $g$ for the data points. Performance of the model for each group is calculated by comparing $\hat{y}$ and $y$ for the datapoints within a specific group, and as a result, disparity of performance across the different groups can be calculated. In many real world scenarios however, group labels ($g$) may not be available at scale during training and validation time, or collecting them might not be feasible or desirable as they could often be sensitive information. As a result, evaluating disparity metrics across categorical groups would not be feasible. On the other hand, in many scenarios noisy groupings may be obtainable using some form of a proxy, which would allow measuring disparity metrics across sub-populations. Here we explore performing such analysis on computer vision models trained on human faces, and on tasks such as face attribute prediction and affect estimation. Our experiments indicate that embeddings resulting from an off-the-shelf face recognition model, could meaningfully serve as a proxy for such estimation.
    ExPLoit: Extracting Private Labels in Split Learning. (arXiv:2112.01299v2 [cs.CR] UPDATED)
    Split learning is a popular technique used for vertical federated learning (VFL), where the goal is to jointly train a model on the private input and label data held by two parties. This technique uses a split-model, trained end-to-end, by exchanging the intermediate representations (IR) of the inputs and gradients of the IR between the two parties. We propose ExPLoit - a label-leakage attack that allows an adversarial input-owner to extract the private labels of the label-owner during split-learning. ExPLoit frames the attack as a supervised learning problem by using a novel loss function that combines gradient-matching and several regularization terms developed using key properties of the dataset and models. Our evaluations show that ExPLoit can uncover the private labels with near-perfect accuracy of up to 99.96%. Our findings underscore the need for better training techniques for VFL.
    A Study of Causal Confusion in Preference-Based Reward Learning. (arXiv:2204.06601v1 [cs.LG])
    Learning robot policies via preference-based reward learning is an increasingly popular method for customizing robot behavior. However, in recent years, there has been a growing body of anecdotal evidence that learning reward functions from preferences is prone to spurious correlations and reward gaming or hacking behaviors. While there is much anecdotal, empirical, and theoretical analysis of causal confusion and reward gaming behaviors both in reinforcement learning and imitation learning approaches that directly map from states to actions, we provide the first systematic study of causal confusion in the context of learning reward functions from preferences. To facilitate this study, we identify a set of three preference learning benchmark domains where we observe causal confusion when learning from offline datasets of pairwise trajectory preferences: a simple reacher domain, an assistive feeding domain, and an itch-scratching domain. To gain insight into this observed causal confusion, we present a sensitivity analysis that explores the effect of different factors--including the type of training data, reward model capacity, and feature dimensionality--on the robustness of rewards learned from preferences. We find evidence that learning rewards from pairwise trajectory preferences is highly sensitive and non-robust to spurious features and increasing model capacity, but not as sensitive to the type of training data. Videos, code, and supplemental results are available at https://sites.google.com/view/causal-reward-confusion.
    deep-significance - Easy and Meaningful Statistical Significance Testing in the Age of Neural Networks. (arXiv:2204.06815v1 [cs.LG])
    A lot of Machine Learning (ML) and Deep Learning (DL) research is of an empirical nature. Nevertheless, statistical significance testing (SST) is still not widely used. This endangers true progress, as seeming improvements over a baseline might be statistical flukes, leading follow-up research astray while wasting human and computational resources. Here, we provide an easy-to-use package containing different significance tests and utility functions specifically tailored towards research needs and usability.
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.
    OpenCSI: An Open-Source Dataset for Indoor Localization Using CSI-Based Fingerprinting. (arXiv:2104.07963v3 [eess.SP] UPDATED)
    Many applications require accurate indoor localization. Fingerprint-based localization methods propose a solution to this problem, but rely on a radio map that is effort-intensive to acquire. We automate the radio map acquisition phase using a software-defined radio (SDR) and a wheeled robot. Furthermore, we open-source a radio map acquired with our automated tool for a 3GPP Long-Term Evolution (LTE) wireless link. To the best of our knowledge, this is the first publicly available radio map containing channel state information (CSI). Finally, we describe first localization experiments on this radio map using a convolutional neural network to regress for location coordinates.
    Tight Bounds for Quantum State Certification with Incoherent Measurements. (arXiv:2204.07155v1 [quant-ph])
    We consider the problem of quantum state certification, where we are given the description of a mixed state $\sigma \in \mathbb{C}^{d \times d}$, $n$ copies of a mixed state $\rho \in \mathbb{C}^{d \times d}$, and $\varepsilon > 0$, and we are asked to determine whether $\rho = \sigma$ or whether $\| \rho - \sigma \|_1 > \varepsilon$. When $\sigma$ is the maximally mixed state $\frac{1}{d} I_d$, this is known as mixedness testing. We focus on algorithms which use incoherent measurements, i.e. which only measure one copy of $\rho$ at a time. Unlike those that use entangled, multi-copy measurements, these can be implemented without persistent quantum memory and thus represent a large class of protocols that can be run on current or near-term devices. For mixedness testing, there is a folklore algorithm which uses incoherent measurements and only needs $O(d^{3/2} / \varepsilon^2)$ copies. The algorithm is non-adaptive, that is, its measurements are fixed ahead of time, and is known to be optimal for non-adaptive algorithms. However, when the algorithm can make arbitrary incoherent measurements, the best known lower bound is only $\Omega (d^{4/3} / \varepsilon^2)$ [Bubeck-Chen-Li '20], and it has been an outstanding open problem to close this polynomial gap. In this work, 1) we settle the copy complexity of mixedness testing with incoherent measurements and show that $\Omega (d^{3/2} / \varepsilon^2)$ copies are necessary, and 2) we show the instance-optimal bounds for state certification to general $\sigma$ first derived by [Chen-Li-O'Donnell '21] for non-adaptive measurements also hold for arbitrary incoherent measurements. Qualitatively, our results say that adaptivity does not help at all for these problems. Our results are based on new techniques that allow us to reduce the problem to understanding certain matrix martingales, which we believe may be of independent interest.
    CLUES: A Benchmark for Learning Classifiers using Natural Language Explanations. (arXiv:2204.07142v1 [cs.CL])
    Supervised learning has traditionally focused on inductive learning by observing labeled examples of a task. In contrast, humans have the ability to learn new concepts from language. Here, we explore training zero-shot classifiers for structured data purely from language. For this, we introduce CLUES, a benchmark for Classifier Learning Using natural language ExplanationS, consisting of a range of classification tasks over structured data along with natural language supervision in the form of explanations. CLUES consists of 36 real-world and 144 synthetic classification tasks. It contains crowdsourced explanations describing real-world tasks from multiple teachers and programmatically generated explanations for the synthetic tasks. To model the influence of explanations in classifying an example, we develop ExEnt, an entailment-based model that learns classifiers using explanations. ExEnt generalizes up to 18% better (relative) on novel tasks than a baseline that does not use explanations. We delineate key challenges for automated learning from explanations, addressing which can lead to progress on CLUES in the future. Code and datasets are available at: https://clues-benchmark.github.io.
    Modeling the effects of environmental and perceptual uncertainty using deterministic reinforcement learning dynamics with partial observability. (arXiv:2109.07259v2 [nlin.AO] UPDATED)
    Assessing the systemic effects of uncertainty that arises from agents' partial observation of the true states of the world is critical for understanding a wide range of scenarios. Yet, previous modeling work on agent learning and decision-making either lacks a systematic way to describe this source of uncertainty or puts the focus on obtaining optimal policies using complex models of the world that would impose an unrealistically high cognitive demand on real agents. In this work we aim to efficiently describe the emergent behavior of biologically plausible and parsimonious learning agents faced with partially observable worlds. Therefore we derive and present deterministic reinforcement learning dynamics where the agents observe the true state of the environment only partially. We showcase the broad applicability of our dynamics across different classes of partially observable agent-environment systems. We find that partial observability creates unintuitive benefits in a number of specific contexts, pointing the way to further research on a general understanding of such effects. For instance, partially observant agents can learn better outcomes faster, in a more stable way and even overcome social dilemmas. Furthermore, our method allows the application of dynamical systems theory to partially observable multiagent leaning. In this regard we find the emergence of catastrophic limit cycles, a critical slowing down of the learning processes between reward regimes and the separation of the learning dynamics into fast and slow directions, all caused by partial observability. Therefore, the presented dynamics have the potential to become a formal, yet practical, lightweight and robust tool for researchers in biology, social science and machine learning to systematically investigate the effects of interacting partially observant agents.
    Multimodal spatiotemporal graph neural networks for improved prediction of 30-day all-cause hospital readmission. (arXiv:2204.06766v1 [cs.LG])
    Measures to predict 30-day readmission are considered an important quality factor for hospitals as accurate predictions can reduce the overall cost of care by identifying high risk patients before they are discharged. While recent deep learning-based studies have shown promising empirical results on readmission prediction, several limitations exist that may hinder widespread clinical utility, such as (a) only patients with certain conditions are considered, (b) existing approaches do not leverage data temporality, (c) individual admissions are assumed independent of each other, which is unrealistic, (d) prior studies are usually limited to single source of data and single center data. To address these limitations, we propose a multimodal, modality-agnostic spatiotemporal graph neural network (MM-STGNN) for prediction of 30-day all-cause hospital readmission that fuses multimodal in-patient longitudinal data. By training and evaluating our methods using longitudinal chest radiographs and electronic health records from two independent centers, we demonstrate that MM-STGNN achieves AUROC of 0.79 on both primary and external datasets. Furthermore, MM-STGNN significantly outperforms the current clinical reference standard, LACE+ score (AUROC=0.61), on the primary dataset. For subset populations of patients with heart and vascular disease, our model also outperforms baselines on predicting 30-day readmission (e.g., 3.7 point improvement in AUROC in patients with heart disease). Lastly, qualitative model interpretability analysis indicates that while patients' primary diagnoses were not explicitly used to train the model, node features crucial for model prediction directly reflect patients' primary diagnoses. Importantly, our MM-STGNN is agnostic to node feature modalities and could be utilized to integrate multimodal data for triaging patients in various downstream resource allocation tasks.
    EvoSTS Forecasting: Evolutionary Sparse Time-Series Forecasting. (arXiv:2204.07066v1 [cs.NE])
    In this work, we highlight our novel evolutionary sparse time-series forecasting algorithm also known as EvoSTS. The algorithm attempts to evolutionary prioritize weights of Long Short-Term Memory (LSTM) Network that best minimize the reconstruction loss of a predicted signal using a learned sparse coded dictionary. In each generation of our evolutionary algorithm, a set number of children with the same initial weights are spawned. Each child undergoes a training step and adjusts their weights on the same data. Due to stochastic back-propagation, the set of children has a variety of weights with different levels of performance. The weights that best minimize the reconstruction loss with a given signal dictionary are passed to the next generation. The predictions from the best-performing weights of the first and last generation are compared. We found improvements while comparing the weights of these two generations. However, due to several confounding parameters and hyperparameter limitations, some of the weights had negligible improvements. To the best of our knowledge, this is the first attempt to use sparse coding in this way to optimize time series forecasting model weights, such as those of an LSTM network.
    A deep learning algorithm for reducing false positives in screening mammography. (arXiv:2204.06671v1 [cs.CV])
    Screening mammography improves breast cancer outcomes by enabling early detection and treatment. However, false positive callbacks for additional imaging from screening exams cause unnecessary procedures, patient anxiety, and financial burden. This work demonstrates an AI algorithm that reduces false positives by identifying mammograms not suspicious for breast cancer. We trained the algorithm to determine the absence of cancer using 123,248 2D digital mammograms (6,161 cancers) and performed a retrospective study on 14,831 screening exams (1,026 cancers) from 15 US and 3 UK sites. Retrospective evaluation of the algorithm on the largest of the US sites (11,592 mammograms, 101 cancers) a) left the cancer detection rate unaffected (p=0.02, non-inferiority margin 0.25 cancers per 1000 exams), b) reduced callbacks for diagnostic exams by 31.1% compared to standard clinical readings, c) reduced benign needle biopsies by 7.4%, and d) reduced screening exams requiring radiologist interpretation by 41.6% in the simulated clinical workflow. This work lays the foundation for semi-autonomous breast cancer screening systems that could benefit patients and healthcare systems by reducing false positives, unnecessary procedures, patient anxiety, and expenses.
    SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study. (arXiv:2204.06699v1 [cs.LG])
    Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they focus only on understanding haploid sequences, which hinders their applicability towards understanding genetic variations, also known as single nucleotide polymorphisms (SNPs), which is crucial for genome-wide association study. In this paper, we introduce SNP2Vec, a scalable self-supervised pre-training approach for understanding SNP. We apply SNP2Vec to perform long-sequence genomics modeling, and we evaluate the effectiveness of our approach on predicting Alzheimer's disease risk in a Chinese cohort. Our approach significantly outperforms existing polygenic risk score methods and all other baselines, including the model that is trained entirely with haploid sequences. We release our code and dataset on https://github.com/HLTCHKUST/snp2vec.
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.
    Twitter User Representation Using Weakly Supervised Graph Embedding. (arXiv:2108.08988v3 [cs.CL] UPDATED)
    Social media platforms provide convenient means for users to participate in multiple online activities on various contents and create fast widespread interactions. However, this rapidly growing access has also increased the diverse information, and characterizing user types to understand people's lifestyle decisions shared in social media is challenging. In this paper, we propose a weakly supervised graph embedding based framework for understanding user types. We evaluate the user embedding learned using weak supervision over well-being related tweets from Twitter, focusing on 'Yoga', 'Keto diet'. Experiments on real-world datasets demonstrate that the proposed framework outperforms the baselines for detecting user types. Finally, we illustrate data analysis on different types of users (e.g., practitioner vs. promotional) from our dataset. While we focus on lifestyle-related tweets (i.e., yoga, keto), our method for constructing user representation readily generalizes to other domains.
    ICSML: Industrial Control Systems Machine Learning Inference Framework natively executing on IEC 61131-3 compliant devices. (arXiv:2202.10075v2 [cs.LG] UPDATED)
    Industrial Control Systems (ICS) have played a catalytic role in enabling the 4th Industrial Revolution. ICS devices like Programmable Logic Controllers (PLCs), automate, monitor, and control critical processes in industrial, energy, and commercial environments. The convergence of traditional Operational Technology (OT) with Information Technology (IT) has opened a new and unique threat landscape. This has inspired defense research that focuses heavily on Machine Learning (ML) based anomaly detection methods that run on external IT hardware, which means an increase in costs and the further expansion of the threat landscape. To remove this requirement, we introduce the ICS machine learning inference framework (ICSML) which enables the execution of ML model inference natively on the PLC. ICSML is implemented in IEC 61131-3 code and provides several optimizations to bypass the limitations imposed by the domain-specific languages. Therefore, it works \emph{on every PLC without the need for vendor support}. ICSML provides a complete set of components for the creation of full ML models similarly to established ML frameworks. We run a series of benchmarks studying memory and performance and compare our solution to the TFLite inference framework. At the same time, we develop domain-specific model optimizations to improve the efficiency of ICSML. To demonstrate the abilities of ICSML, we evaluate a case study of a real defense for process-aware attacks targeting a desalination plant.
    Measurement-based Admission Control in Sliced Networks: A Best Arm Identification Approach. (arXiv:2204.06910v1 [cs.NI])
    In sliced networks, the shared tenancy of slices requires adaptive admission control of data flows, based on measurements of network resources. In this paper, we investigate the design of measurement-based admission control schemes, deciding whether a new data flow can be admitted and in this case, on which slice. The objective is to devise a joint measurement and decision strategy that returns a correct decision (e.g., the least loaded slice) with a certain level of confidence while minimizing the measurement cost (the number of measurements made before committing to the decision). We study the design of such strategies for several natural admission criteria specifying what a correct decision is. For each of these criteria, using tools from best arm identification in bandits, we first derive an explicit information-theoretical lower bound on the cost of any algorithm returning the correct decision with fixed confidence. We then devise a joint measurement and decision strategy achieving this theoretical limit. We compare empirically the measurement costs of these strategies, and compare them both to the lower bounds as well as a naive measurement scheme. We find that our algorithm significantly outperforms the naive scheme (by a factor $2-8$).
    Surface Similarity Parameter: A New Machine Learning Loss Metric for Oscillatory Spatio-Temporal Data. (arXiv:2204.06843v1 [cs.LG])
    Supervised machine learning approaches require the formulation of a loss functional to be minimized in the training phase. Sequential data are ubiquitous across many fields of research, and are often treated with Euclidean distance-based loss functions that were designed for tabular data. For smooth oscillatory data, those conventional approaches lack the ability to penalize amplitude, frequency and phase prediction errors at the same time, and tend to be biased towards amplitude errors. We introduce the surface similarity parameter (SSP) as a novel loss function that is especially useful for training machine learning models on smooth oscillatory sequences. Our extensive experiments on chaotic spatio-temporal dynamical systems indicate that the SSP is beneficial for shaping gradients, thereby accelerating the training process, reducing the final prediction error, and implementing a stronger regularization effect compared to using classical loss functions. The results indicate the potential of the novel loss metric particularly for highly complex and chaotic data, such as data stemming from the nonlinear two-dimensional Kuramoto-Sivashinsky equation and the linear propagation of dispersive surface gravity waves in fluids.
    Exploring the Distributed Knowledge Congruence in Proxy-data-free Federated Distillation. (arXiv:2204.07028v1 [cs.LG])
    Federated learning (FL) is a distributed machine learning paradigm in which the server periodically aggregates local model parameters from clients without assembling their private data. User-constrained communication bandwidth and the requirement for personalized models pose severe challenges to FL. Federated distillation (FD) is proposed to simultaneously address the two problems, which exchanges knowledge between the server and clients, supporting heterogeneous local models while significantly reducing communication overhead. However, most existing FD methods require a proxy dataset, which is often unavailable. Proxy-data-free FD approaches eliminate the need for additional public data beyond clients' private data, but suffer from remarkable discrepancy among local knowledge due to model heterogeneity, leading to ambiguous representation on the server and inevitable accuracy degradation. To tackle this issue, we propose a proxy-data-free FD algorithm based on distributed knowledge congruence (FedDKC). FedDKC leverages well-designed refinement strategies to narrow local knowledge differences into an acceptable upper bound to mitigate the negative effects of knowledge incongruence. Specifically, from perspectives of peak probability and Shannon entropy of local knowledge, we design kernel-based knowledge refinement (KKR) and searching-based knowledge refinement (SKR) respectively, and theoretically guarantee the refined-local knowledge can satisfy an approximately-similar distribution and be regarded as congruent. Extensive experiments conducted on three common datasets demonstrate that our proposed FedDKC method outperforms the state-of-the-art in 93.33% of comparisons, and achieves faster convergence without increasing communication overhead.
    Assessing the communication gap between AI models and healthcare professionals: explainability, utility and trust in AI-driven clinical decision-making. (arXiv:2204.05030v2 [cs.AI] UPDATED)
    This paper contributes with a pragmatic evaluation framework for explainable Machine Learning (ML) models for clinical decision support. The study revealed a more nuanced role for ML explanation models, when these are pragmatically embedded in the clinical context. Despite the general positive attitude of healthcare professionals (HCPs) towards explanations as a safety and trust mechanism, for a significant set of participants there were negative effects associated with confirmation bias, accentuating model over-reliance and increased effort to interact with the model. Also, contradicting one of its main intended functions, standard explanatory models showed limited ability to support a critical understanding of the limitations of the model. However, we found new significant positive effects which repositions the role of explanations within a clinical context: these include reduction of automation bias, addressing ambiguous clinical cases (cases where HCPs were not certain about their decision) and support of less experienced HCPs in the acquisition of new domain knowledge.
    The multi-modal universe of fast-fashion: the Visuelle 2.0 benchmark. (arXiv:2204.06972v1 [cs.CV])
    We present Visuelle 2.0, the first dataset useful for facing diverse prediction problems that a fast-fashion company has to manage routinely. Furthermore, we demonstrate how the use of computer vision is substantial in this scenario. Visuelle 2.0 contains data for 6 seasons / 5355 clothing products of Nuna Lie, a famous Italian company with hundreds of shops located in different areas within the country. In particular, we focus on a specific prediction problem, namely short-observation new product sale forecasting (SO-fore). SO-fore assumes that the season has started and a set of new products is on the shelves of the different stores. The goal is to forecast the sales for a particular horizon, given a short, available past (few weeks), since no earlier statistics are available. To be successful, SO-fore approaches should capture this short past and exploit other modalities or exogenous data. To these aims, Visuelle 2.0 is equipped with disaggregated data at the item-shop level and multi-modal information for each clothing item, allowing computer vision approaches to come into play. The main message that we deliver is that the use of image data with deep networks boosts performances obtained when using the time series in long-term forecasting scenarios, ameliorating the WAPE by 8.2% and the MAE by 7.7%. The dataset is available at: https://humaticslab.github.io/forecasting/visuelle.
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.
    HCR-Net: A deep learning based script independent handwritten character recognition network. (arXiv:2108.06663v2 [cs.CV] UPDATED)
    Despite being studied extensively for a few decades, handwritten character recognition (HCR) is considered a challenging learning problem in pattern recognition and there is very limited research on script independent models. This is mainly because of diversity of scripts, focus of the conventional research on handcrafted feature extraction techniques, and unavailability of public datasets and codes to reproduce the results. On the other hand, deep learning has witnessed huge success in different areas of pattern recognition, including HCR, and provides end-to-end learning but it has been studied for specific scripts only. In this paper, we have proposed a novel deep learning architecture which exploits transfer learning and image-augmentation for end-to-end learning for script independent handwritten character recognition, called HCR-Net. HCR-Net is based on a novel transfer learning approach for HCR, where some of lower layers of a pre-trained network are utilized. Due to transfer learning and image-augmentation, HCR-Net provides faster training, better performance and better generalizations, and can achieve up to 99\% results of its final accuracy in just first epoch. The experimental results on publicly available datasets of Bangla, Punjabi, Hindi, English, Swedish, Urdu, Farsi, Tibetan, Kannada, Malayalam, Telugu, Marathi, Nepali and Arabic languages prove the efficacy of HCR-Net and establishes several new benchmarks. For reproducibility of the results and for the advancements of the HCR research, complete code is publicly released at https://github.com/jmdvinodjmd/HCR-Net.
    A Study of Low-Resource Speech Commands Recognition based on Adversarial Reprogramming. (arXiv:2110.03894v2 [eess.AS] UPDATED)
    In this study, we propose a novel adversarial reprogramming (AR) approach for low-resource spoken command recognition (SCR), and build an AR-SCR system. The AR procedure aims to modify the acoustic signals (from the target domain) to repurpose a pretrained SCR model (from the source domain). To solve the label mismatches between source and target domains, and further improve the stability of AR, we propose a novel similarity-based label mapping technique to align classes. In addition, the transfer learning (TL) technique is combined with the original AR process to improve the model adaptation capability. We evaluate the proposed AR-SCR system on three low-resource SCR datasets, including Arabic, Lithuanian, and dysarthric Mandarin speech. Experimental results show that with a pretrained AM trained on a large-scale English dataset, the proposed AR-SCR system outperforms the current state-of-the-art results on Arabic and Lithuanian speech commands datasets, with only a limited amount of training data.
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.
    Real-time Adversarial Perturbations against Deep Reinforcement Learning Policies: Attacks and Defenses. (arXiv:2106.08746v3 [cs.LG] UPDATED)
    Recent work has shown that deep reinforcement learning (DRL) policies are vulnerable to adversarial perturbations. Adversaries can mislead policies of DRL agents by perturbing the state of the environment observed by the agents. Existing attacks are feasible in principle but face challenges in practice, either by being too slow to fool DRL policies in real time or by modifying past observations stored in the agent's memory. We show that using the Universal Adversarial Perturbation (UAP) method to compute perturbations, independent of the individual inputs to which they are applied to, can fool DRL policies effectively and in real time. We describe three such attack variants. Via an extensive evaluation using three Atari 2600 games, we show that our attacks are effective, as they fully degrade the performance of three different DRL agents (up to 100%, even when the $l_\infty$ bound on the perturbation is as small as 0.01). It is faster compared to the response time (0.6ms on average) of different DRL policies, and considerably faster than prior attacks using adversarial perturbations (1.8ms on average). We also show that our attack technique is efficient, incurring an online computational cost of 0.027ms on average. Using two further tasks involving robotic movement, we confirm that our results generalize to more complex DRL tasks. Furthermore, we demonstrate that the effectiveness of known defenses diminishes against universal perturbations. We propose an effective technique that detects all known adversarial perturbations against DRL policies, including all the universal perturbations presented in this paper.
    Extracting Finite Automata from RNNs Using State Merging. (arXiv:2201.12451v3 [cs.LG] UPDATED)
    One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior. In this work, we propose a new method for extracting finite automata from RNNs inspired by the state merging paradigm from grammatical inference. We demonstrate the effectiveness of our method on the Tomita languages benchmark, where we find that it is able to extract faithful automata from RNNs trained on all languages in the benchmark. We find that extraction performance is aided by the number of data provided during the extraction process, as well as, curiously, whether the RNN model is trained for additional epochs after perfectly learning its target language. We use our method to analyze this phenomenon, finding that training beyond convergence is useful because it leads to compression of the internal state space of the RNN. This finding demonstrates how our method can be used for interpretability and analysis of trained RNN models.
    SkillNet: A Sparsely Activated Model for General-Purpose Natural Language Understanding. (arXiv:2203.03312v2 [cs.CL] UPDATED)
    Prevailing deep models are single-purpose and overspecialize at individual tasks. However, when being extended to new tasks, they typically forget previously learned skills and learn from scratch. We address this issue by introducing SkillNet, a general-purpose model that stitches together existing skills to learn new tasks more effectively. The key feature of our approach is that it is sparsely activated guided by predefined skills. Different from traditional dense models that always activate all the model parameters, SkillNet only activates parts of the model parameters whose skills are relevant to the target task. When learning for a new task, our approach precisely activates required skills and also provides an option to add new skills. We evaluate on natural language understandings tasks and have the following findings. First, with only one model checkpoint, SkillNet performs better than task-specific fine-tuning and two multi-task learning baselines (i.e., dense model and Mixture-of-Experts model) on six tasks. Second, sparsely activated pre-training further improves the overall performance. Third, SkillNet significantly outperforms baseline systems when being extended to new tasks.
    Fine-Grained Population Mobility Data-Based Community-Level COVID-19 Prediction Model. (arXiv:2202.06257v2 [cs.LG] UPDATED)
    Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained population mobility data-based model (FGC-COVID) utilizing data of two geographic levels for community-level COVID-19 prediction. We use the population mobility data between Census Block Groups (CBGs), which is a finer-grained geographic level than community, to build the graph and capture the dependencies between CBGs using graph neural networks (GNNs). To mine as finer-grained patterns as possible for prediction, a spatial weighted aggregation module is introduced to aggregate the embeddings of CBGs to community level based on their geographic affiliation and spatial autocorrelation. Extensive experiments on 300 days LA city COVID-19 data indicate our model outperforms existing forecasting models on community-level COVID-19 prediction.
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.
    DeePN$^2$: A deep learning-based non-Newtonian hydrodynamic model. (arXiv:2112.14798v3 [physics.comp-ph] UPDATED)
    A long standing problem in the modeling of non-Newtonian hydrodynamics of polymeric flows is the availability of reliable and interpretable hydrodynamic models that faithfully encode the underlying micro-scale polymer dynamics. The main complication arises from the long polymer relaxation time, the complex molecular structure and heterogeneous interaction. DeePN$^2$, a deep learning-based non-Newtonian hydrodynamic model, has been proposed and has shown some success in systematically passing the micro-scale structural mechanics information to the macro-scale hydrodynamics for suspensions with simple polymer conformation and bond potential. The model retains a multi-scaled nature by mapping the polymer configurations into a set of symmetry-preserving macro-scale features. The extended constitutive laws for these macro-scale features can be directly learned from the kinetics of their micro-scale counterparts. In this paper, we develop DeePN$^2$ using more complex micro-structural models. We show that DeePN$^2$ can faithfully capture the broadly overlooked viscoelastic differences arising from the specific molecular structural mechanics without human intervention.
    Stream-based Active Learning with Verification Latency in Non-stationary Environments. (arXiv:2204.06822v1 [cs.LG])
    Data stream classification is an important problem in the field of machine learning. Due to the non-stationary nature of the data where the underlying distribution changes over time (concept drift), the model needs to continuously adapt to new data statistics. Stream-based Active Learning (AL) approaches address this problem by interactively querying a human expert to provide new data labels for the most recent samples, within a limited budget. Existing AL strategies assume that labels are immediately available, while in a real-world scenario the expert requires time to provide a queried label (verification latency), and by the time the requested labels arrive they may not be relevant anymore. In this article, we investigate the influence of finite, time-variable, and unknown verification delay, in the presence of concept drift on AL approaches. We propose PRopagate (PR), a latency independent utility estimator which also predicts the requested, but not yet known, labels. Furthermore, we propose a drift-dependent dynamic budget strategy, which uses a variable distribution of the labelling budget over time, after a detected drift. Thorough experimental evaluation, with both synthetic and real-world non-stationary datasets, and different settings of verification latency and budget are conducted and analyzed. We empirically show that the proposed method consistently outperforms the state-of-the-art. Additionally, we demonstrate that with variable budget allocation in time, it is possible to boost the performance of AL strategies, without increasing the overall labeling budget.
    LEFM-Nets: Learnable Explicit Feature Map Deep Networks for Segmentation of Histopathological Images of Frozen Sections. (arXiv:2204.06955v1 [eess.IV])
    Accurate segmentation of medical images is essential for diagnosis and treatment of diseases. These problems are solved by highly complex models, such as deep networks (DN), requiring a large amount of labeled data for training. Thereby, many DNs possess task- or imaging modality specific architectures with a decision-making process that is often hard to explain and interpret. Here, we propose a framework that embeds existing DNs into a low-dimensional subspace induced by the learnable explicit feature map (LEFM) layer. Compared to the existing DN, the framework adds one hyperparameter and only modestly increase the number of learnable parameters. The method is aimed at, but not limited to, segmentation of low-dimensional medical images, such as color histopathological images of stained frozen sections. Since features in the LEFM layer are polynomial functions of the original features, proposed LEFM-Nets contribute to the interpretability of network decisions. In this work, we combined LEFM with the known networks: DeepLabv3+, UNet, UNet++ and MA-net. New LEFM-Nets are applied to the segmentation of adenocarcinoma of a colon in a liver from images of hematoxylin and eosin (H&E) stained frozen sections. LEFM-Nets are also tested on nuclei segmentation from images of H&E stained frozen sections of ten human organs. On the first problem, LEFM-Nets achieved statistically significant performance improvement in terms of micro balanced accuracy and $F_1$ score than original networks. LEFM-Nets achieved only better performance in comparison with the original networks on the second problem. The source code is available at https://github.com/dsitnik/lefm.
    A Level Set Theory for Neural Implicit Evolution under Explicit Flows. (arXiv:2204.07159v1 [cs.CV])
    Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. Several of these operations can be viewed as energy-minimization problems that induce an instantaneous flow field on the explicit surface. Our method uses the flow field to deform parametric implicit surfaces by extending the classical theory of level sets. We also derive a consolidated view for existing methods on differentiable surface extraction and rendering, by formalizing connections to the level-set theory. We show that these methods drift from the theory and that our approach exhibits improvements for applications like surface smoothing, mean-curvature flow, inverse rendering and user-defined editing on implicit geometry.
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals. (arXiv:2204.06644v1 [cs.LG])
    We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model. Originated in ELECTRA, this training strategy has demonstrated sample-efficiency to pretrain models at the scale of hundreds of millions of parameters. In this work, we conduct a comprehensive empirical study, and propose a recipe, namely "Model generated dEnoising TRaining Objective" (METRO), which incorporates some of the best modeling techniques developed recently to speed up, stabilize, and enhance pretrained language models without compromising model effectiveness. The resultant models, METRO-LM, consisting of up to 5.4 billion parameters, achieve new state-of-the-art on the GLUE, SuperGLUE, and SQuAD benchmarks. More importantly, METRO-LM are efficient in that they often outperform previous large models with significantly smaller model sizes and lower pretraining cost.
    Performance Assessment of different Machine Learning Algorithm for Life-Time Prediction of Solder Joints based on Synthetic Data. (arXiv:2204.06627v1 [cs.LG])
    This paper proposes a computationally efficient methodology to predict the damage progression in solder contacts of electronic components using temperature-time curves. For this purpose, two machine learning algorithms, a Multilayer Perceptron and a Long Short-Term Memory network, are trained and compared with respect to their prediction accuracy and the required amount of training data. The training is performed using synthetic, normally distributed data that is realistic for automotive applications. A finite element model of a simple bipolar chip resistor in surface mount technology configuration is used to numerically compute the synthetic data. As a result, both machine learning algorithms show a relevant accuracy for the prediction of accumulated creep strains. With a training data length of 350 hours (12.5% of the available training data), both models show a constantly good fitting performance of $R^2$ of 0.72 for the Multilayer Perceptron and $R^2$ of 0.87 for the Long Short-Term Memory network. The prediction errors of the accumulated creep strains are less than 10% with an amount of 350 hours training data and decreases to less than 5 % when using further data. Therefore, both approaches are promising for the lifetime prediction directly on the electronic device.
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.
    Question rewriting? Assessing its importance for conversational question answering. (arXiv:2201.09146v2 [cs.CL] UPDATED)
    In conversational question answering, systems must correctly interpret the interconnected interactions and generate knowledgeable answers, which may require the retrieval of relevant information from a background repository. Recent approaches to this problem leverage neural language models, although different alternatives can be considered in terms of modules for (a) representing user questions in context, (b) retrieving the relevant background information, and (c) generating the answer. This work presents a conversational question answering system designed specifically for the Search-Oriented Conversational AI (SCAI) shared task, and reports on a detailed analysis of its question rewriting module. In particular, we considered different variations of the question rewriting module to evaluate the influence on the subsequent components, and performed a careful analysis of the results obtained with the best system configuration. Our system achieved the best performance in the shared task and our analysis emphasizes the importance of the conversation context representation for the overall system performance.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    LSTM-Autoencoder based Anomaly Detection for Indoor Air Quality Time Series Data. (arXiv:2204.06701v1 [cs.LG])
    Anomaly detection for indoor air quality (IAQ) data has become an important area of research as the quality of air is closely related to human health and well-being. However, traditional statistics and shallow machine learning-based approaches in anomaly detection in the IAQ area could not detect anomalies involving the observation of correlations across several data points (i.e., often referred to as long-term dependences). We propose a hybrid deep learning model that combines LSTM with Autoencoder for anomaly detection tasks in IAQ to address this issue. In our approach, the LSTM network is comprised of multiple LSTM cells that work with each other to learn the long-term dependences of the data in a time-series sequence. Autoencoder identifies the optimal threshold based on the reconstruction loss rates evaluated on every data across all time-series sequences. Our experimental results, based on the Dunedin CO2 time-series dataset obtained through a real-world deployment of the schools in New Zealand, demonstrate a very high and robust accuracy rate (99.50%) that outperforms other similar models.
    BottleFit: Learning Compressed Representations in Deep Neural Networks for Effective and Efficient Split Computing. (arXiv:2201.02693v2 [cs.LG] UPDATED)
    Although mission-critical applications require the use of deep neural networks (DNNs), their continuous execution at mobile devices results in a significant increase in energy consumption. While edge offloading can decrease energy consumption, erratic patterns in channel quality, network and edge server load can lead to severe disruption of the system's key operations. An alternative approach, called split computing, generates compressed representations within the model (called "bottlenecks"), to reduce bandwidth usage and energy consumption. Prior work has proposed approaches that introduce additional layers, to the detriment of energy consumption and latency. For this reason, we propose a new framework called BottleFit, which, in addition to targeted DNN architecture modifications, includes a novel training strategy to achieve high accuracy even with strong compression rates. We apply BottleFit on cutting-edge DNN models in image classification, and show that BottleFit achieves 77.1% data compression with up to 0.6% accuracy loss on ImageNet dataset, while state of the art such as SPINN loses up to 6% in accuracy. We experimentally measure the power consumption and latency of an image classification application running on an NVIDIA Jetson Nano board (GPU-based) and a Raspberry PI board (GPU-less). We show that BottleFit decreases power consumption and latency respectively by up to 49% and 89% with respect to (w.r.t.) local computing and by 37% and 55% w.r.t. edge offloading. We also compare BottleFit with state-of-the-art autoencoders-based approaches, and show that (i) BottleFit reduces power consumption and execution time respectively by up to 54% and 44% on the Jetson and 40% and 62% on Raspberry PI; (ii) the size of the head model executed on the mobile device is 83 times smaller. We publish the code repository for reproducibility of the results in this study.
    A Natural Language Processing Approach for Instruction Set Architecture Identification. (arXiv:2204.06624v1 [cs.CR])
    Binary analysis of software is a critical step in cyber forensics applications such as program vulnerability assessment and malware detection. This involves interpreting instructions executed by software and often necessitates converting the software's binary file data to assembly language. The conversion process requires information about the binary file's target instruction set architecture (ISA). However, ISA information might not be included in binary files due to compilation errors, partial downloads, or adversarial corruption of file metadata. Machine learning (ML) is a promising methodology that can be used to identify the target ISA using binary data in the object code section of binary files. In this paper we propose a binary code feature extraction model to improve the accuracy and scalability of ML-based ISA identification methods. Our feature extraction model can be used in the absence of domain knowledge about the ISAs. Specifically, we adapt models from natural language processing (NLP) to i) identify successive byte patterns commonly observed in binary codes, ii) estimate the significance of each byte pattern to a binary file, and iii) estimate the relevance of each byte pattern in distinguishing between ISAs. We introduce character-level features of encoded binaries to identify fine-grained bit patterns inherent to each ISA. We use a dataset with binaries from 12 different ISAs to evaluate our approach. Empirical evaluations show that using our byte-level features in ML-based ISA identification results in an 8% higher accuracy than the state-of-the-art features based on byte-histograms and byte pattern signatures. We observe that character-level features allow reducing the size of the feature set by up to 16x while maintaining accuracy above 97%.
    Sign Bit is Enough: A Learning Synchronization Framework for Multi-hop All-reduce with Ultimate Compression. (arXiv:2204.06787v1 [cs.LG])
    Traditional one-bit compressed stochastic gradient descent can not be directly employed in multi-hop all-reduce, a widely adopted distributed training paradigm in network-intensive high-performance computing systems such as public clouds. According to our theoretical findings, due to the cascading compression, the training process has considerable deterioration on the convergence performance. To overcome this limitation, we implement a sign-bit compression-based learning synchronization framework, Marsit. It prevents cascading compression via an elaborate bit-wise operation for unbiased sign aggregation and its specific global compensation mechanism for mitigating compression deviation. The proposed framework retains the same theoretical convergence rate as non-compression mechanisms. Experimental results demonstrate that Marsit reduces up to 35% training time while preserving the same accuracy as training without compression.
    Efficient and practical quantum compiler towards multi-qubit systems with deep reinforcement learning. (arXiv:2204.06904v1 [quant-ph])
    Efficient quantum compiling tactics greatly enhance the capability of quantum computers to execute complicated quantum algorithms. Due to its fundamental importance, a plethora of quantum compilers has been designed in past years. However, there are several caveats to current protocols, which are low optimality, high inference time, limited scalability, and lack of universality. To compensate for these defects, here we devise an efficient and practical quantum compiler assisted by advanced deep reinforcement learning (RL) techniques, i.e., data generation, deep Q-learning, and AQ* search. In this way, our protocol is compatible with various quantum machines and can be used to compile multi-qubit operators. We systematically evaluate the performance of our proposal in compiling quantum operators with both inverse-closed and inverse-free universal basis sets. In the task of single-qubit operator compiling, our proposal outperforms other RL-based quantum compilers in the measure of compiling sequence length and inference time. Meanwhile, the output solution is near-optimal, guaranteed by the Solovay-Kitaev theorem. Notably, for the inverse-free universal basis set, the achieved sequence length complexity is comparable with the inverse-based setting and dramatically advances previous methods. These empirical results contribute to improving the inverse-free Solovay-Kitaev theorem. In addition, for the first time, we demonstrate how to leverage RL-based quantum compilers to accomplish two-qubit operator compiling. The achieved results open an avenue for integrating RL with quantum compiling to unify efficiency and practicality and thus facilitate the exploration of quantum advantages.
    MIMO Channel Estimation using Score-Based Generative Models. (arXiv:2204.07122v1 [eess.SP])
    Channel estimation is a critical task in multiple-input multiple-output digital communications that has effects on end-to-end system performance. In this work, we introduce a novel approach for channel estimation using deep score-based generative models. These models are trained to estimate the gradient of the log-prior distribution, and can be used to iteratively refine estimates, given observed measurements of a signal. We introduce a framework for training score-based generative models for wireless channels, as well as performing channel estimation using posterior sampling at test time. We derive theoretical robustness guarantees of channel estimation with posterior sampling in single-input single-output scenarios, and show that the observations regarding estimation performance are verified experimentally in MIMO channels. Our results in simulated clustered delay line channels show competitive in-distribution performance without error floors in the high signal-to-noise ratio regime, and robust out-of-distribution performance, outperforming competing deep learning methods by up to 5 dB in end-to-end communication performance, while the complexity analysis reveals how model architecture can efficiently trade performance for estimation latency.
    data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language. (arXiv:2202.03555v2 [cs.LG] UPDATED)
    While the general idea of self-supervised learning is identical across modalities, the actual algorithms and objectives differ widely because they were developed with a single modality in mind. To get us closer to general self-supervised learning, we present data2vec, a framework that uses the same learning method for either speech, NLP or computer vision. The core idea is to predict latent representations of the full input data based on a masked view of the input in a self-distillation setup using a standard Transformer architecture. Instead of predicting modality-specific targets such as words, visual tokens or units of human speech which are local in nature, data2vec predicts contextualized latent representations that contain information from the entire input. Experiments on the major benchmarks of speech recognition, image classification, and natural language understanding demonstrate a new state of the art or competitive performance to predominant approaches.
    From Environmental Sound Representation to Robustness of 2D CNN Models Against Adversarial Attacks. (arXiv:2204.07018v1 [cs.SD])
    This paper investigates the impact of different standard environmental sound representations (spectrograms) on the recognition performance and adversarial attack robustness of a victim residual convolutional neural network, namely ResNet-18. Our main motivation for focusing on such a front-end classifier rather than other complex architectures is balancing recognition accuracy and the total number of training parameters. Herein, we measure the impact of different settings required for generating more informative Mel-frequency cepstral coefficient (MFCC), short-time Fourier transform (STFT), and discrete wavelet transform (DWT) representations on our front-end model. This measurement involves comparing the classification performance over the adversarial robustness. We demonstrate an inverse relationship between recognition accuracy and model robustness against six benchmarking attack algorithms on the balance of average budgets allocated by the adversary and the attack cost. Moreover, our experimental results have shown that while the ResNet-18 model trained on DWT spectrograms achieves a high recognition accuracy, attacking this model is relatively more costly for the adversary than other 2D representations. We also report some results on different convolutional neural network architectures such as ResNet-34, ResNet-56, AlexNet, and GoogLeNet, SB-CNN, and LSTM-based.  ( 2 min )
    Learning Convolutional Neural Networks in Frequency Domain. (arXiv:2204.06718v1 [cs.CV])
    Convolutional neural network (CNN) achieves impressive success in the field of computer vision during the past few decades. As the core of CNNs, image convolution operation helps CNNs to achieve good performance on image-related tasks. However, image convolution is hard to be implemented and parallelized. In this paper, we propose a novel neural network model, namely CEMNet, that can be trained in frequency domain. The most important motivation of this research is that we can use the very simple element-wise multiplication operation to replace the image convolution in frequency domain based on Cross-Correlation Theorem. We further introduce Weight Fixation Mechanism to alleviate over-fitting, and analyze the working behavior of Batch Normalization, Leaky ReLU and Dropout in frequency domain to design their counterparts for CEMNet. Also, to deal with complex inputs brought by DFT, we design two branch network structure for CEMNet. Experimental results imply that CEMNet works well in frequency domain, and achieve good performance on MNIST and CIFAR-10 databases. To our knowledge, CEMNet is the first model trained in Fourier Domain that achieves more than 70\% validation accuracy on CIFAR-10 database.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Joint Coreset Construction and Quantization for Distributed Machine Learning. (arXiv:2204.06652v1 [cs.LG])
    Coresets are small, weighted summaries of larger datasets, aiming at providing provable error bounds for machine learning (ML) tasks while significantly reducing the communication and computation costs. To achieve a better trade-off between ML error bounds and costs, we propose the first framework to incorporate quantization techniques into the process of coreset construction. Specifically, we theoretically analyze the ML error bounds caused by a combination of coreset construction and quantization. Based on that, we formulate an optimization problem to minimize the ML error under a fixed budget of communication cost. To improve the scalability for large datasets, we identify two proxies of the original objective function, for which efficient algorithms are developed. For the case of data on multiple nodes, we further design a novel algorithm to allocate the communication budget to the nodes while minimizing the overall ML error. Through extensive experiments on multiple real-world datasets, we demonstrate the effectiveness and efficiency of our proposed algorithms for a variety of ML tasks. In particular, our algorithms have achieved more than 90% data reduction with less than 10% degradation in ML performance in most cases.  ( 2 min )
    Control-oriented meta-learning. (arXiv:2204.06716v1 [cs.RO])
    Real-time adaptation is imperative to the control of robots operating in complex, dynamic environments. Adaptive control laws can endow even nonlinear systems with good trajectory tracking performance, provided that any uncertain dynamics terms are linearly parameterizable with known nonlinear features. However, it is often difficult to specify such features a priori, such as for aerodynamic disturbances on rotorcraft or interaction forces between a manipulator arm and various objects. In this paper, we turn to data-driven modeling with neural networks to learn, offline from past data, an adaptive controller with an internal parametric model of these nonlinear features. Our key insight is that we can better prepare the controller for deployment with control-oriented meta-learning of features in closed-loop simulation, rather than regression-oriented meta-learning of features to fit input-output data. Specifically, we meta-learn the adaptive controller with closed-loop tracking simulation as the base-learner and the average tracking error as the meta-objective. With both fully-actuated and underactuated nonlinear planar rotorcraft subject to wind, we demonstrate that our adaptive controller outperforms other controllers trained with regression-oriented meta-learning when deployed in closed-loop for trajectory tracking control.  ( 2 min )
    Leveraging convergence behavior to balance conflicting tasks in multi-task learning. (arXiv:2204.06698v1 [cs.LG])
    Multi-Task Learning is a learning paradigm that uses correlated tasks to improve performance generalization. A common way to learn multiple tasks is through the hard parameter sharing approach, in which a single architecture is used to share the same subset of parameters, creating an inductive bias between them during the training process. Due to its simplicity, potential to improve generalization, and reduce computational cost, it has gained the attention of the scientific and industrial communities. However, tasks often conflict with each other, which makes it challenging to define how the gradients of multiple tasks should be combined to allow simultaneous learning. To address this problem, we use the idea of multi-objective optimization to propose a method that takes into account temporal behaviour of the gradients to create a dynamic bias that adjust the importance of each task during the backpropagation. The result of this method is to give more attention to the tasks that are diverging or that are not being benefited during the last iterations, allowing to ensure that the simultaneous learning is heading to the performance maximization of all tasks. As a result, we empirically show that the proposed method outperforms the state-of-art approaches on learning conflicting tasks. Unlike the adopted baselines, our method ensures that all tasks reach good generalization performances.  ( 2 min )
    Multifidelity deep neural operators for efficient learning of partial differential equations with application to fast inverse design of nanoscale heat transport. (arXiv:2204.06684v1 [physics.comp-ph])
    Deep neural operators can learn operators mapping between infinite-dimensional function spaces via deep neural networks and have become an emerging paradigm of scientific machine learning. However, training neural operators usually requires a large amount of high-fidelity data, which is often difficult to obtain in real engineering problems. Here, we address this challenge by using multifidelity learning, i.e., learning from multifidelity datasets. We develop a multifidelity neural operator based on a deep operator network (DeepONet). A multifidelity DeepONet includes two standard DeepONets coupled by residual learning and input augmentation. Multifidelity DeepONet significantly reduces the required amount of high-fidelity data and achieves one order of magnitude smaller error when using the same amount of high-fidelity data. We apply a multifidelity DeepONet to learn the phonon Boltzmann transport equation (BTE), a framework to compute nanoscale heat transport. By combining a trained multifidelity DeepONet with genetic algorithm or topology optimization, we demonstrate a fast solver for the inverse design of BTE problems.  ( 2 min )
    Leveraging Natural Learning Processing to Uncover Themes in Clinical Notes of Patients Admitted for Heart Failure. (arXiv:2204.07074v1 [cs.LG])
    Heart failure occurs when the heart is not able to pump blood and oxygen to support other organs in the body as it should. Treatments include medications and sometimes hospitalization. Patients with heart failure can have both cardiovascular as well as non-cardiovascular comorbidities. Clinical notes of patients with heart failure can be analyzed to gain insight into the topics discussed in these notes and the major comorbidities in these patients. In this regard, we apply machine learning techniques, such as topic modeling, to identify the major themes found in the clinical notes specific to the procedures performed on 1,200 patients admitted for heart failure at the University of Illinois Hospital and Health Sciences System (UI Health). Topic modeling revealed five hidden themes in these clinical notes, including one related to heart disease comorbidities.  ( 2 min )
    The Vision of Self-Evolving Computing Systems. (arXiv:2204.06825v1 [cs.SE])
    Computing systems are omnipresent; their sustainability has become crucial for our society. A key aspect of this sustainability is the ability of computing systems to cope with the continuous change they face, ranging from dynamic operating conditions, to changing goals, and technological progress. While we are able to engineer smart computing systems that autonomously deal with various types of changes, handling unanticipated changes requires system evolution, which remains in essence a human-centered process. This will eventually become unmanageable. To break through the status quo, we put forward an arguable opinion for the vision of self-evolving computing systems that are equipped with an evolutionary engine enabling them to evolve autonomously. Specifically, when a self-evolving computing system detects conditions outside its operational domain, such as an anomaly or a new goal, it activates an evolutionary engine that runs online experiments to determine how the system needs to evolve to deal with the changes, thereby evolving its architecture. During this process the engine can integrate new computing elements that are provided by computing warehouses. These computing elements provide specifications and procedures enabling their automatic integration. We motivate the need for self-evolving computing systems in light of the state of the art, outline a conceptual architecture of self-evolving computing systems, and illustrate the architecture for a future smart city mobility system that needs to evolve continuously with changing conditions. To conclude, we highlight key research challenges to realize the vision of self-evolving computing systems.  ( 2 min )
    Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin-picking. (arXiv:2204.07049v1 [cs.RO])
    In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.  ( 2 min )
    Magnetic Resonance Spectroscopy Deep Learning Denoising Using Few In Vivo Data. (arXiv:2101.11442v2 [physics.med-ph] UPDATED)
    Magnetic Resonance Spectroscopy (MRS) is a noninvasive tool to reveal metabolic information. One challenge of 1H-MRS is the low Signal-Noise Ratio (SNR). To improve the SNR, a typical approach is to perform Signal Averaging (SA) with M repeated samples. The data acquisition time, however, is increased by M times accordingly, and a complete clinical MRS scan takes approximately 10 minutes at a common setting M=128. Recently, deep learning has been introduced to improve the SNR but most of them use the simulated data as the training set. This may hinder the MRS applications since some potential differences, such as acquisition system imperfections, and physiological and psychologic conditions may exist between the simulated and in vivo data. Here, we proposed a new scheme that purely used the repeated samples of realistic data. A deep learning model, Refusion Long Short-Term Memory (ReLSTM), was designed to learn the mapping from the low SNR time-domain data (24 SA) to the high SNR one (128 SA). Experiments on the in vivo brain spectra of 7 healthy subjects, 2 brain tumor patients and 1 cerebral infarction patient showed that only using 20% repeated samples, the denoised spectra by ReLSTM could provide comparable estimated concentrations of metabolites to 128 SA. Compared with the state-of-the-art low-rank denoising method, the ReLSTM achieved the lower relative error and the Cram\'er-Rao lower bounds in quantifying some important biomarkers. In summary, ReLSTM can perform high-fidelity denoising of the spectra under fast acquisition (24 SA), which would be valuable to MRS clinical studies.  ( 2 min )
    Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity. (arXiv:2204.06618v1 [cs.CC])
    This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT). We show that UHAT and GUHAT Transformers, viewed as string acceptors, can only recognize formal languages in the complexity class AC$^0$, the class of languages recognizable by families of Boolean circuits of constant depth and polynomial size. This upper bound subsumes Hahn's (2020) results that GUHAT cannot recognize the DYCK languages or the PARITY language, since those languages are outside AC$^0$ (Furst et al., 1984). In contrast, the non-AC$^0$ languages MAJORITY and DYCK-1 are recognizable by AHAT networks, implying that AHAT can recognize languages that UHAT and GUHAT cannot.  ( 2 min )
    Network state Estimation using Raw Video Analysis: vQoS-GAN based non-intrusive Deep Learning Approach. (arXiv:2204.07062v1 [cs.MM])
    Content based providers transmits real time complex signal such as video data from one region to another. During this transmission process, the signals usually end up distorted or degraded where the actual information present in the video is lost. This normally happens in the streaming video services applications. Hence there is a need to know the level of degradation that happened in the receiver side. This video degradation can be estimated by network state parameters like data rate and packet loss values. Our proposed solution vQoS GAN (video Quality of Service Generative Adversarial Network) can estimate the network state parameters from the degraded received video data using a deep learning approach of semi supervised generative adversarial network algorithm. A robust and unique design of deep learning network model has been trained with the video data along with data rate and packet loss class labels and achieves over 95 percent of training accuracy. The proposed semi supervised generative adversarial network can additionally reconstruct the degraded video data to its original form for a better end user experience.  ( 2 min )
    Clifford Circuits can be Properly PAC Learned if and only if $\textsf{RP}=\textsf{NP}$. (arXiv:2204.06638v1 [quant-ph])
    Given a dataset of input states, measurements, and probabilities, is it possible to efficiently predict the measurement probabilities associated with a quantum circuit? Recent work of Caro and Datta (2020) studied the problem of PAC learning quantum circuits in an information theoretic sense, leaving open questions of computational efficiency. In particular, one candidate class of circuits for which an efficient learner might have been possible was that of Clifford circuits, since the corresponding set of states generated by such circuits, called stabilizer states, are known to be efficiently PAC learnable (Rocchetto 2018). Here we provide a negative result, showing that proper learning of CNOT circuits is hard for classical learners unless $\textsf{RP} = \textsf{NP}$. As the classical analogue and subset of Clifford circuits, this naturally leads to a hardness result for Clifford circuits as well. Additionally, we show that if $\textsf{RP} = \textsf{NP}$ then there would exist efficient proper learning algorithms for CNOT and Clifford circuits. By similar arguments, we also find that an efficient proper quantum learner for such circuits exists if and only if $\textsf{NP} \subseteq \textsf{RQP}$.  ( 2 min )
    EEG-ITNet: An Explainable Inception Temporal Convolutional Network for Motor Imagery Classification. (arXiv:2204.06947v1 [cs.LG])
    In recent years, neural networks and especially deep architectures have received substantial attention for EEG signal analysis in the field of brain-computer interfaces (BCIs). In this ongoing research area, the end-to-end models are more favoured than traditional approaches requiring signal transformation pre-classification. They can eliminate the need for prior information from experts and the extraction of handcrafted features. However, although several deep learning algorithms have been already proposed in the literature, achieving high accuracies for classifying motor movements or mental tasks, they often face a lack of interpretability and therefore are not quite favoured by the neuroscience community. The reasons behind this issue can be the high number of parameters and the sensitivity of deep neural networks to capture tiny yet unrelated discriminative features. We propose an end-to-end deep learning architecture called EEG-ITNet and a more comprehensible method to visualise the network learned patterns. Using inception modules and causal convolutions with dilation, our model can extract rich spectral, spatial, and temporal information from multi-channel EEG signals with less complexity (in terms of the number of trainable parameters) than other existing end-to-end architectures, such as EEG-Inception and EEG-TCNet. By an exhaustive evaluation on dataset 2a from BCI competition IV and OpenBMI motor imagery dataset, EEG-ITNet shows up to 5.9\% improvement in the classification accuracy in different scenarios with statistical significance compared to its competitors. We also comprehensively explain and support the validity of network illustration from a neuroscientific perspective. We have also made our code open at https://github.com/AbbasSalami/EEG-ITNet  ( 2 min )
    Sketch guided and progressive growing GAN for realistic and editable ultrasound image synthesis. (arXiv:2204.06929v1 [eess.IV])
    Ultrasound (US) imaging is widely used for anatomical structure inspection in clinical diagnosis. The training of new sonographers and deep learning based algorithms for US image analysis usually requires a large amount of data. However, obtaining and labeling large-scale US imaging data are not easy tasks, especially for diseases with low incidence. Realistic US image synthesis can alleviate this problem to a great extent. In this paper, we propose a generative adversarial network (GAN) based image synthesis framework. Our main contributions include: 1) we present the first work that can synthesize realistic B-mode US images with high-resolution and customized texture editing features; 2) to enhance structural details of generated images, we propose to introduce auxiliary sketch guidance into a conditional GAN. We superpose the edge sketch onto the object mask and use the composite mask as the network input; 3) to generate high-resolution US images, we adopt a progressive training strategy to gradually generate high-resolution images from low-resolution images. In addition, a feature loss is proposed to minimize the difference of high-level features between the generated and real images, which further improves the quality of generated images; 4) the proposed US image synthesis method is quite universal and can also be generalized to the US images of other anatomical structures besides the three ones tested in our study (lung, hip joint, and ovary); 5) extensive experiments on three large US image datasets are conducted to validate our method. Ablation studies, customized texture editing, user studies, and segmentation tests demonstrate promising results of our method in synthesizing realistic US images.  ( 2 min )
    Modularity benefits reinforcement learning agents with competing homeostatic drives. (arXiv:2204.06608v1 [cs.LG])
    The problem of balancing conflicting needs is fundamental to intelligence. Standard reinforcement learning algorithms maximize a scalar reward, which requires combining different objective-specific rewards into a single number. Alternatively, different objectives could also be combined at the level of action value, such that specialist modules responsible for different objectives submit different action suggestions to a decision process, each based on rewards that are independent of one another. In this work, we explore the potential benefits of this alternative strategy. We investigate a biologically relevant multi-objective problem, the continual homeostasis of a set of variables, and compare a monolithic deep Q-network to a modular network with a dedicated Q-learner for each variable. We find that the modular agent: a) requires minimal exogenously determined exploration; b) has improved sample efficiency; and c) is more robust to out-of-domain perturbation.  ( 2 min )
    Learning Invariances with Generalised Input-Convex Neural Networks. (arXiv:2204.07009v1 [cs.LG])
    Considering smooth mappings from input vectors to continuous targets, our goal is to characterise subspaces of the input domain, which are invariant under such mappings. Thus, we want to characterise manifolds implicitly defined by level sets. Specifically, this characterisation should be of a global parametric form, which is especially useful for different informed data exploration tasks, such as building grid-based approximations, sampling points along the level curves, or finding trajectories on the manifold. However, global parameterisations can only exist if the level sets are connected. For this purpose, we introduce a novel and flexible class of neural networks that generalise input-convex networks. These networks represent functions that are guaranteed to have connected level sets forming smooth manifolds on the input space. We further show that global parameterisations of these level sets can be always found efficiently. Lastly, we demonstrate that our novel technique for characterising invariances is a powerful generative data exploration tool in real-world applications, such as computational chemistry.  ( 2 min )
    Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems. (arXiv:2204.07135v1 [cs.LG])
    Skill routing is an important component in large-scale conversational systems. In contrast to traditional rule-based skill routing, state-of-the-art systems use a model-based approach to enable natural conversations. To provide supervision signal required to train such models, ideas such as human annotation, replication of a rule-based system, relabeling based on user paraphrases, and bandit-based learning were suggested. However, these approaches: (a) do not scale in terms of the number of skills and skill on-boarding, (b) require a very costly expert annotation/rule-design, (c) introduce risks in the user experience with each model update. In this paper, we present a scalable self-learning approach to explore routing alternatives without causing abrupt policy changes that break the user experience, learn from the user interaction, and incrementally improve the routing via frequent model refreshes. To enable such robust frequent model updates, we suggest a simple and effective approach that ensures controlled policy updates for individual domains, followed by an off-policy evaluation for making deployment decisions without any need for lengthy A/B experimentation. We conduct various offline and online A/B experiments on a commercial large-scale conversational system to demonstrate the effectiveness of the proposed method in real-world production settings.
    SVAM: Saliency-guided Visual Attention Modeling by Autonomous Underwater Robots. (arXiv:2011.06252v2 [cs.CV] UPDATED)
    This paper presents a holistic approach to saliency-guided visual attention modeling (SVAM) for use by autonomous underwater robots. Our proposed model, named SVAM-Net, integrates deep visual features at various scales and semantics for effective salient object detection (SOD) in natural underwater images. The SVAM-Net architecture is configured in a unique way to jointly accommodate bottom-up and top-down learning within two separate branches of the network while sharing the same encoding layers. We design dedicated spatial attention modules (SAMs) along these learning pathways to exploit the coarse-level and fine-level semantic features for SOD at four stages of abstractions. The bottom-up branch performs a rough yet reasonably accurate saliency estimation at a fast rate, whereas the deeper top-down branch incorporates a residual refinement module (RRM) that provides fine-grained localization of the salient objects. Extensive performance evaluation of SVAM-Net on benchmark datasets clearly demonstrates its effectiveness for underwater SOD. We also validate its generalization performance by several ocean trials' data that include test images of diverse underwater scenes and waterbodies, and also images with unseen natural objects. Moreover, we analyze its computational feasibility for robotic deployments and demonstrate its utility in several important use cases of visual attention modeling.
    Ensemble learning using individual neonatal data for seizure detection. (arXiv:2204.07043v1 [eess.SP])
    Sharing medical data between institutions is difficult in practice due to data protection laws and official procedures within institutions. Therefore, most existing algorithms are trained on relatively small electroencephalogram (EEG) data sets which is likely to be detrimental to prediction accuracy. In this work, we simulate a case when the data can not be shared by splitting the publicly available data set into disjoint sets representing data in individual institutions. We propose to train a (local) detector in each institution and aggregate their individual predictions into one final prediction. Four aggregation schemes are compared, namely, the majority vote, the mean, the weighted mean and the Dawid-Skene method. The approach allows different detector architectures amongst the institutions. The method was validated on an independent data set using only a subset of EEG channels. The ensemble reaches accuracy comparable to a single detector trained on all the data when sufficient amount of data is available in each institution. The weighted mean aggregation scheme showed best overall performance, it was only marginally outperformed by the Dawid-Skene method when local detectors approach performance of a single detector trained on all available data.
    Surrogate NAS Benchmarks: Going Beyond the Limited Search Spaces of Tabular NAS Benchmarks. (arXiv:2008.09777v4 [cs.LG] UPDATED)
    The most significant barrier to the advancement of Neural Architecture Search (NAS) is its demand for large computational resources, which hinders scientifically sound empirical evaluations of NAS methods. Tabular NAS benchmarks have alleviated this problem substantially, making it possible to properly evaluate NAS methods in seconds on commodity machines. However, an unintended consequence of tabular NAS benchmarks has been a focus on extremely small architectural search spaces since their construction relies on exhaustive evaluations of the space. This leads to unrealistic results that do not transfer to larger spaces. To overcome this fundamental limitation, we propose a methodology to create cheap NAS surrogate benchmarks for arbitrary search spaces. We exemplify this approach by creating surrogate NAS benchmarks on the existing tabular NAS-Bench-101 and on two widely used NAS search spaces with up to $10^{21}$ architectures ($10^{13}$ times larger than any previous tabular NAS benchmark). We show that surrogate NAS benchmarks can model the true performance of architectures better than tabular benchmarks (at a small fraction of the cost), that they lead to faithful estimates of how well different NAS methods work on the original non-surrogate benchmark, and that they can generate new scientific insight. We open-source all our code and believe that surrogate NAS benchmarks are an indispensable tool to extend scientifically sound work on NAS to large and exciting search spaces.
  • Open

    Incompleteness of graph convolutional neural networks for points clouds in three dimensions. (arXiv:2201.07136v2 [stat.ML] UPDATED)
    Graph neural networks (GNN) are very popular methods in machine learning and have been applied very successfully to the prediction of the properties of molecules and materials. First-order GNNs are well known to be incomplete, i.e., there exist graphs that are distinct but appear identical when seen through the lens of the GNN. More complicated schemes have thus been designed to increase their resolving power. Applications to molecules (and more generally, point clouds), however, add a geometric dimension to the problem. The most straightforward and prevalent approach to construct graph representation for molecules regards atoms as vertices in a graph and draws a bond between each pair of atoms within a chosen cutoff. Bonds can be decorated with the distance between atoms, and the resulting "distance graph NNs" (dGNN) have empirically demonstrated excellent resolving power and are widely used in chemical ML, with all known indistinguishable graphs being resolved in the fully-connected limit. Here we show that even for the restricted case of fully-connected graphs induced by 3D atom clouds dGNNs are not complete. We construct pairs of distinct point clouds that generate graphs that, for any cutoff radius, are equivalent based on a first-order Weisfeiler-Lehman test. This class of degenerate structures includes chemically-plausible configurations, setting an ultimate limit to the expressive power of some of the well-established GNN architectures for atomistic machine learning. Models that explicitly use angular or directional information in the description of atomic environments can resolve these degeneracies.
    Global Counterfactual Explanations: Investigations, Implementations and Improvements. (arXiv:2204.06917v1 [cs.LG])
    Counterfactual explanations have been widely studied in explainability, with a range of application dependent methods emerging in fairness, recourse and model understanding. However, the major shortcoming associated with these methods is their inability to provide explanations beyond the local or instance-level. While some works touch upon the notion of a global explanation, typically suggesting to aggregate masses of local explanations in the hope of ascertaining global properties, few provide frameworks that are either reliable or computationally tractable. Meanwhile, practitioners are requesting more efficient and interactive explainability tools. We take this opportunity to investigate existing global methods, with a focus on implementing and improving Actionable Recourse Summaries (AReS), the only known global counterfactual explanation framework for recourse.
    Sparse Interaction Neighborhood Selection for Markov Random Fields via Reversible Jump and Pseudoposteriors. (arXiv:2204.05933v2 [stat.CO] UPDATED)
    We consider the problem of estimating the interacting neighborhood of a Markov Random Field model with finite support and homogeneous pairwise interactions based on relative positions of a two-dimensional lattice. Using a Bayesian framework, we propose a Reversible Jump Monte Carlo Markov Chain algorithm that jumps across subsets of a maximal range neighborhood, allowing us to perform model selection based on a marginal pseudoposterior distribution of models.
    Improving Computational Complexity in Statistical Models with Second-Order Information. (arXiv:2202.04219v3 [stat.ML] UPDATED)
    It is known that when the statistical models are singular, i.e., the Fisher information matrix at the true parameter is degenerate, the fixed step-size gradient descent algorithm takes polynomial number of steps in terms of the sample size $n$ to converge to a final statistical radius around the true parameter, which can be unsatisfactory for the application. To further improve that computational complexity, we consider the utilization of the second-order information in the design of optimization algorithms. Specifically, we study the normalized gradient descent (NormGD) algorithm for solving parameter estimation in parametric statistical models, which is a variant of gradient descent algorithm whose step size is scaled by the maximum eigenvalue of the Hessian matrix of the empirical loss function of statistical models. When the population loss function, i.e., the limit of the empirical loss function when $n$ goes to infinity, is homogeneous in all directions, we demonstrate that the NormGD iterates reach a final statistical radius around the true parameter after a logarithmic number of iterations in terms of $n$. Therefore, for fixed dimension $d$, the NormGD algorithm achieves the optimal overall computational complexity $\mathcal{O}(n)$ to reach the final statistical radius. This computational complexity is cheaper than that of the fixed step-size gradient descent algorithm, which is of the order $\mathcal{O}(n^{\tau})$ for some $\tau > 1$, to reach the same statistical radius. We illustrate our general theory under two statistical models: generalized linear models and mixture models, and experimental results support our prediction with general theory.
    Ranking Feature-Block Importance in Artificial Multiblock Neural Networks. (arXiv:2109.10279v2 [cs.LG] UPDATED)
    In artificial neural networks, understanding the contributions of input features on the prediction fosters model explainability and delivers relevant information about the dataset. While typical setups for feature importance ranking assess input features individually, in this study, we go one step further and rank the importance of groups of features, denoted as feature-blocks. A feature-block can contain features of a specific type or features derived from a particular source, which are presented to the neural network in separate input branches (multiblock ANNs). This work presents three methods pursuing distinct strategies to rank features in multiblock ANNs by their importance: (1) a composite strategy building on individual feature importance rankings, (2) a knock-in, and (3) a knock-out strategy. While the composite strategy builds on state-of-the-art feature importance rankings, knock-in and knock-out strategies evaluate the block as a whole via a mutual information criterion. Our experiments consist of a simulation study validating all three approaches, followed by a case study on two distinct real-world datasets to compare the strategies. We conclude that each strategy has its merits for specific application scenarios.
    Procrastinated Tree Search: Black-box Optimization with Delayed, Noisy, and Multi-Fidelity Feedback. (arXiv:2110.07232v2 [cs.LG] UPDATED)
    In black-box optimization problems, we aim to maximize an unknown objective function, where the function is only accessible through feedbacks of an evaluation or simulation oracle. In real-life, the feedbacks of such oracles are often noisy and available after some unknown delay that may depend on the computation time of the oracle. Additionally, if the exact evaluations are expensive but coarse approximations are available at a lower cost, the feedbacks can have multi-fidelity. In order to address this problem, we propose a generic extension of hierarchical optimistic tree search (HOO), called ProCrastinated Tree Search (PCTS), that flexibly accommodates a delay and noise-tolerant bandit algorithm. We provide a generic proof technique to quantify regret of PCTS under delayed, noisy, and multi-fidelity feedbacks. Specifically, we derive regret bounds of PCTS enabled with delayed-UCB1 (DUCB1) and delayed-UCB-V (DUCBV) algorithms. Given a horizon $T$, PCTS retains the regret bound of non-delayed HOO for expected delay of $O(\log T)$ and worsens by $O(T^{\frac{1-\alpha}{d+2}})$ for expected delays of $O(T^{1-\alpha})$ for $\alpha \in (0,1]$. We experimentally validate on multiple synthetic functions and hyperparameter tuning problems that PCTS outperforms the state-of-the-art black-box optimization methods for feedbacks with different noise levels, delays, and fidelity.
    Fairness without Imputation: A Decision Tree Approach for Fair Prediction with Missing Values. (arXiv:2109.10431v2 [cs.LG] UPDATED)
    We investigate the fairness concerns of training a machine learning model using data with missing values. Even though there are a number of fairness intervention methods in the literature, most of them require a complete training set as input. In practice, data can have missing values, and data missing patterns can depend on group attributes (e.g. gender or race). Simply applying off-the-shelf fair learning algorithms to an imputed dataset may lead to an unfair model. In this paper, we first theoretically analyze different sources of discrimination risks when training with an imputed dataset. Then, we propose an integrated approach based on decision trees that does not require a separate process of imputation and learning. Instead, we train a tree with missing incorporated as attribute (MIA), which does not require explicit imputation, and we optimize a fairness-regularized objective function. We demonstrate that our approach outperforms existing fairness intervention methods applied to an imputed dataset, through several experiments on real-world datasets.
    Non-stationary Bandits and Meta-Learning with a Small Set of Optimal Arms. (arXiv:2202.13001v3 [cs.LG] UPDATED)
    We study a sequential decision problem where the learner faces a sequence of $K$-armed stochastic bandit tasks. The tasks may be designed by an adversary, but the adversary is constrained to choose the optimal arm of each task in a smaller (but unknown) subset of $M$ arms. The task boundaries might be known (the bandit meta-learning setting), or unknown (the non-stationary bandit setting), and the number of tasks $N$ as well as the total number of rounds $T$ are known ($N$ could be unknown in the meta-learning setting). We design an algorithm based on a reduction to bandit submodular maximization, and show that its regret in both settings is smaller than the simple baseline of $\tilde{O}(\sqrt{KNT})$ that can be obtained by using standard algorithms designed for non-stationary bandit problems. For the bandit meta-learning problem with fixed task length $\tau$, we show that the regret of the algorithm is bounded as $\tilde{O}(N\sqrt{M \tau}+N^{2/3})$. Under additional assumptions on the identifiability of the optimal arms in each task, we show a bandit meta-learning algorithm with an improved $\tilde{O}(N\sqrt{M \tau}+N^{1/2})$ regret.
    Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback. (arXiv:2204.06660v1 [cs.LG])
    We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner. Our algorithm adopts a universal prediction perspective, whose performance is analyzed with regret against a general expert selection sequence. The regret we study is against a general competition class that covers many settings (such as the switching or contextual experts settings) and the expert selection sequences in the competition class are determined by the application at hand. Our regret bounds are second order bounds in terms of the sum of squared losses and the normalized regret of our algorithm is invariant under arbitrary affine transforms of the loss sequence. Our algorithm is truly online and does not use any preliminary information about the loss sequences.  ( 2 min )
    To Split or Not to Split: The Impact of Disparate Treatment in Classification. (arXiv:2002.04788v4 [cs.LG] UPDATED)
    Disparate treatment occurs when a machine learning model yields different decisions for individuals based on a sensitive attribute (e.g., age, sex). In domains where prediction accuracy is paramount, it could potentially be acceptable to fit a model which exhibits disparate treatment. To evaluate the effect of disparate treatment, we compare the performance of split classifiers (i.e., classifiers trained and deployed separately on each group) with group-blind classifiers (i.e., classifiers which do not use a sensitive attribute). We introduce the benefit-of-splitting for quantifying the performance improvement by splitting classifiers. Computing the benefit-of-splitting directly from its definition could be intractable since it involves solving optimization problems over an infinite-dimensional functional space. Under different performance measures, we (i) prove an equivalent expression for the benefit-of-splitting which can be efficiently computed by solving small-scale convex programs; (ii) provide sharp upper and lower bounds for the benefit-of-splitting which reveal precise conditions where a group-blind classifier will always suffer from a non-trivial performance gap from the split classifiers. In the finite sample regime, splitting is not necessarily beneficial and we provide data-dependent bounds to understand this effect. Finally, we validate our theoretical results through numerical experiments on both synthetic and real-world datasets.  ( 2 min )
    Classification of Hyperspectral Images Using SVM with Shape-adaptive Reconstruction and Smoothed Total Variation. (arXiv:2203.15619v3 [cs.CV] UPDATED)
    In this work, a novel algorithm called SVM with Shape-adaptive Reconstruction and Smoothed Total Variation (SaR-SVM-STV) is introduced to classify hyperspectral images, which makes full use of spatial and spectral information. The Shape-adaptive Reconstruction (SaR) is introduced to preprocess each pixel based on the Pearson Correlation between pixels in its shape-adaptive (SA) region. Support Vector Machines (SVMs) are trained to estimate the pixel-wise probability maps of each class. Then the Smoothed Total Variation (STV) model is applied to denoise and generate the final classification map. Experiments show that SaR-SVM-STV outperforms the SVM-STV method with a few training labels, demonstrating the significance of reconstructing hyperspectral images before classification.  ( 2 min )
    Data Augmentation for Bayesian Deep Learning. (arXiv:1903.09668v3 [stat.ML] UPDATED)
    Deep Learning (DL) methods have emerged as one of the most powerful tools for functional approximation and prediction. While the representation properties of DL have been well studied, uncertainty quantification remains challenging and largely unexplored. Data augmentation techniques are a natural approach to provide uncertainty quantification and to incorporate stochastic Monte Carlo search into stochastic gradient descent (SGD) methods. The purpose of our paper is to show that training DL architectures with data augmentation leads to efficiency gains. We use the theory of scale mixtures of normals to derive data augmentation strategies for deep learning. This allows variants of the expectation-maximization and MCMC algorithms to be brought to bear on these high dimensional nonlinear deep learning models. To demonstrate our methodology, we develop data augmentation algorithms for a variety of commonly used activation functions: logit, ReLU, leaky ReLU and SVM. Our methodology is compared to traditional stochastic gradient descent with back-propagation. Our optimization procedure leads to a version of iteratively re-weighted least squares and can be implemented at scale with accelerated linear algebra methods providing substantial improvement in speed. We illustrate our methodology on a number of standard datasets. Finally, we conclude with directions for future research.  ( 2 min )
    Observable adjustments in single-index models for regularized M-estimators. (arXiv:2204.06990v1 [math.ST])
    We consider observations $(X,y)$ from single index models with unknown link function, Gaussian covariates and a regularized M-estimator $\hat\beta$ constructed from convex loss function and regularizer. In the regime where sample size $n$ and dimension $p$ are both increasing such that $p/n$ has a finite limit, the behavior of the empirical distribution of $\hat\beta$ and the predicted values $X\hat\beta$ has been previously characterized in a number of models: The empirical distributions are known to converge to proximal operators of the loss and penalty in a related Gaussian sequence model, which captures the interplay between ratio $p/n$, loss, regularization and the data generating process. This connection between$(\hat\beta,X\hat\beta)$ and the corresponding proximal operators require solving fixed-point equations that typically involve unobservable quantities such as the prior distribution on the index or the link function. This paper develops a different theory to describe the empirical distribution of $\hat\beta$ and $X\hat\beta$: Approximations of $(\hat\beta,X\hat\beta)$ in terms of proximal operators are provided that only involve observable adjustments. These proposed observable adjustments are data-driven, e.g., do not require prior knowledge of the index or the link function. These new adjustments yield confidence intervals for individual components of the index, as well as estimators of the correlation of $\hat\beta$ with the index. The interplay between loss, regularization and the model is thus captured in a data-driven manner, without solving the fixed-point equations studied in previous works. The results apply to both strongly convex regularizers and unregularized M-estimation. Simulations are provided for the square and logistic loss in single index models including logistic regression and 1-bit compressed sensing with 20\% corrupted bits.  ( 2 min )
    Concentration of Random Feature Matrices in High-Dimensions. (arXiv:2204.06935v1 [stat.ML])
    The spectra of random feature matrices provide essential information on the conditioning of the linear system used in random feature regression problems and are thus connected to the consistency and generalization of random feature models. Random feature matrices are asymmetric rectangular nonlinear matrices depending on two input variables, the data and the weights, which can make their characterization challenging. We consider two settings for the two input variables, either both are random variables or one is a random variable and the other is well-separated, i.e. there is a minimum distance between points. With conditions on the dimension, the complexity ratio, and the sampling variance, we show that the singular values of these matrices concentrate near their full expectation and near one with high-probability. In particular, since the dimension depends only on the logarithm of the number of random weights or the number of data points, our complexity bounds can be achieved even in moderate dimensions for many practical setting. The theoretical results are verified with numerical experiments.  ( 2 min )
    Modelling Non-Smooth Signals with Complex Spectral Structure. (arXiv:2203.06997v2 [stat.ML] UPDATED)
    The Gaussian Process Convolution Model (GPCM; Tobar et al., 2015a) is a model for signals with complex spectral structure. A significant limitation of the GPCM is that it assumes a rapidly decaying spectrum: it can only model smooth signals. Moreover, inference in the GPCM currently requires (1) a mean-field assumption, resulting in poorly calibrated uncertainties, and (2) a tedious variational optimisation of large covariance matrices. We redesign the GPCM model to induce a richer distribution over the spectrum with relaxed assumptions about smoothness: the Causal Gaussian Process Convolution Model (CGPCM) introduces a causality assumption into the GPCM, and the Rough Gaussian Process Convolution Model (RGPCM) can be interpreted as a Bayesian nonparametric generalisation of the fractional Ornstein-Uhlenbeck process. We also propose a more effective variational inference scheme, going beyond the mean-field assumption: we design a Gibbs sampler which directly samples from the optimal variational solution, circumventing any variational optimisation entirely. The proposed variations of the GPCM are validated in experiments on synthetic and real-world data, showing promising results.  ( 2 min )
    Wassmap: Wasserstein Isometric Mapping for Image Manifold Learning. (arXiv:2204.06645v1 [cs.LG])
    In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a parameter-free nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise quadratic Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global techniques.  ( 2 min )
    Regret, stability & fairness in matching markets with bandit learners. (arXiv:2102.06246v2 [cs.LG] UPDATED)
    Making an informed decision -- for example, when choosing a career or housing -- requires knowledge about the available options. Such knowledge is generally acquired through costly trial and error, but this learning process can be disrupted by competition. In this work, we study how competition affects the long-term outcomes of individuals as they learn. We build on a line of work that models this setting as a two-sided matching market with bandit learners. A recent result in this area states that it is impossible to simultaneously guarantee two natural desiderata: stability and low optimal regret for all agents. Resource-allocating platforms can point to this result as a justification for assigning good long-term outcomes to some agents and poor ones to others. We show that this impossibility need not hold true. In particular, by modeling two additional components of competition -- namely, costs and transfers -- we prove that it is possible to simultaneously guarantee four desiderata: stability, low optimal regret, fairness in the distribution of regret, and high social welfare.  ( 2 min )
    Achieving Representative Data via Convex Hull Feasibility Sampling Algorithms. (arXiv:2204.06664v1 [stat.ML])
    Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is simply collecting more representative training data. In this paper, we consider the task of assembling a training dataset in which minority groups are adequately represented from a given set of data sources. In essence, this is an adaptive sampling problem to determine if a given point lies in the convex hull of the means from a set of unknown distributions. We present adaptive sampling methods to determine, with high confidence, whether it is possible to assemble a representative dataset from the given data sources. We also demonstrate the efficacy of our policies in simulations in the Bernoulli and a multinomial setting.  ( 2 min )
    Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings. (arXiv:2104.08928v2 [stat.ML] UPDATED)
    Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retailing to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word "positive" typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient is tested positive for a disease. Intuitively, we expect that only a small number of domain-specific words may have new meanings/usages. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our estimator, proving that it can achieve the same accuracy (compared to not transfer learning) with substantially less domain-specific data when only a small number of embeddings are altered between domains. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate the effectiveness of our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.  ( 2 min )
    Finding MNEMON: Reviving Memories of Node Embeddings. (arXiv:2204.06963v1 [cs.LG])
    Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks. Little attention has been paid to understand the privacy risks of integrating the output from graph embedding models (e.g., node embeddings) with complex downstream machine learning pipelines. In this paper, we fill this gap and propose a novel model-agnostic graph recovery attack that exploits the implicit graph structural information preserved in the embeddings of graph nodes. We show that an adversary can recover edges with decent accuracy by only gaining access to the node embedding matrix of the original graph without interactions with the node embedding models. We demonstrate the effectiveness and applicability of our graph recovery attack through extensive experiments.  ( 2 min )
    Streamable Neural Audio Synthesis With Non-Causal Convolutions. (arXiv:2204.07064v1 [cs.SD])
    Deep learning models are mostly used in an offline inference fashion. However, this strongly limits the use of these models inside audio generation setups, as most creative workflows are based on real-time digital signal processing. Although approaches based on recurrent networks can be naturally adapted to this buffer-based computation, the use of convolutions still poses some serious challenges. To tackle this issue, the use of causal streaming convolutions have been proposed. However, this requires specific complexified training and can impact the resulting audio quality. In this paper, we introduce a new method allowing to produce non-causal streaming models. This allows to make any convolutional model compatible with real-time buffer-based processing. As our method is based on a post-training reconfiguration of the model, we show that it is able to transform models trained without causal constraints into a streaming model. We show how our method can be adapted to fit complex architectures with parallel branches. To evaluate our method, we apply it on the recent RAVE model, which provides high-quality real-time audio synthesis. We test our approach on multiple music and speech datasets and show that it is faster than overlap-add methods, while having no impact on the generation quality. Finally, we introduce two open-source implementation of our work as Max/MSP and PureData externals, and as a VST audio plugin. This allows to endow traditional digital audio workstation with real-time neural audio synthesis on a laptop CPU.  ( 2 min )
    Kernel Thinning. (arXiv:2105.05842v7 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.  ( 2 min )
    Program Analysis of Probabilistic Programs. (arXiv:2204.06868v1 [cs.PL])
    Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference algorithm can be used as a probabilistic programming back-end that is simultaneously reliable, efficient, black-box, and general. Probabilistic programming languages often choose a single algorithm to apply to a given problem, thus inheriting its limitations. While substantial work has been done both to formalise probabilistic programming and to improve efficiency of inference, there has been little work that makes use of the available program structure, by formally analysing it, to better utilise the underlying inference algorithm. This dissertation presents three novel techniques (both static and dynamic), which aim to improve probabilistic programming using program analysis. The techniques analyse a probabilistic program and adapt it to make inference more efficient, sometimes in a way that would have been tedious or impossible to do by hand.  ( 2 min )
    Optimal Stopping via Randomized Neural Networks. (arXiv:2104.13669v2 [stat.ML] UPDATED)
    This paper presents new machine learning approaches to approximate the solutions of optimal stopping problems. The key idea of these methods is to use neural networks, where the parameters of the hidden layers are generated randomly and only the last layer is trained, in order to approximate the continuation value. Our approaches are applicable to high dimensional problems where the existing approaches become increasingly impractical. In addition, since our approaches can be optimized using simple linear regression, they are easy to implement and theoretical guarantees are provided. Our randomized reinforcement learning approach and randomized recurrent neural network approach outperform the state-of-the-art and other relevant machine learning approaches in Markovian and non-Markovian examples, respectively. In particular, we test our approaches on Black-Scholes, Heston, rough Heston and fractional Brownian motion. Moreover, we show that they can also be used to efficiently compute Greeks of American options.  ( 2 min )
    Semi-Discriminative Representation Loss for Online Continual Learning. (arXiv:2006.11234v4 [stat.ML] UPDATED)
    The use of episodic memory in continual learning has demonstrated effectiveness for alleviating catastrophic forgetting. In recent studies, gradient-based approaches have been developed to make more efficient use of compact episodic memory. Such approaches refine the gradients resulting from new samples by those from memorized samples, aiming to reduce the diversity of gradients from different tasks. In this paper, we clarify the relation between diversity of gradients and discriminativeness of representations, showing shared as well as conflicting interests between Deep Metric Learning and continual learning, thus demonstrating pros and cons of learning discriminative representations in continual learning. Based on these findings, we propose a simple method -- Semi-Discriminative Representation Loss (SDRL) -- for continual learning. In comparison with state-of-the-art methods, SDRL shows better performance with low computational cost on multiple benchmark tasks in the setting of online continual learning.  ( 2 min )
    Optimal Training of Fair Predictive Models. (arXiv:1910.04109v3 [stat.ML] UPDATED)
    Recently there has been sustained interest in modifying prediction algorithms to satisfy fairness constraints. These constraints are typically complex nonlinear functionals of the observed data distribution. Focusing on the path-specific causal constraints proposed by Nabi and Shpitser (2018), we introduce new theoretical results and optimization techniques to make model training easier and more accurate. Specifically, we show how to reparameterize the observed data likelihood such that fairness constraints correspond directly to parameters that appear in the likelihood, transforming a complex constrained optimization objective into a simple optimization problem with box constraints. We also exploit methods from empirical likelihood theory in statistics to improve predictive performance by constraining baseline covariates, without requiring parametric models. We combine the merits of both proposals to optimize a hybrid reparameterized likelihood. The techniques presented here should be applicable more broadly to fair prediction proposals that impose constraints on predictive models.  ( 2 min )
    Learning Optimal Dynamic Treatment Regimes Using Causal Tree Methods in Medicine. (arXiv:2204.07124v1 [stat.ML])
    Dynamic treatment regimes (DTRs) are used in medicine to tailor sequential treatment decisions to patients by considering patient heterogeneity. Common methods for learning optimal DTRs, however, have shortcomings: they are typically based on outcome prediction and not treatment effect estimation, or they use linear models that are restrictive for patient data from modern electronic health records. To address these shortcomings, we develop two novel methods for learning optimal DTRs that effectively handle complex patient data. We call our methods DTR-CT and DTR-CF. Our methods are based on a data-driven estimation of heterogeneous treatment effects using causal tree methods, specifically causal trees and causal forests, that learn non-linear relationships, control for time-varying confounding, are doubly robust, and explainable. To the best of our knowledge, our paper is the first that adapts causal tree methods for learning optimal DTRs. We evaluate our proposed methods using synthetic data and then apply them to real-world data from intensive care units. Our methods outperform state-of-the-art baselines in terms of cumulative regret and percentage of optimal decisions by a considerable margin. Our work improves treatment recommendations from electronic health record and is thus of direct relevance for personalized medicine.  ( 2 min )
    Gradient boosting for convex cone predict and optimize problems. (arXiv:2204.06895v1 [cs.LG])
    Many problems in engineering and statistics involve both predictive forecasting and decision-based optimization. Traditionally, predictive models are optimized independently from the final decision-based optimization problem. In contrast, a `smart, predict then optimize' (SPO) framework optimizes prediction models to explicitly minimize the final downstream decision loss. In this paper we present dboost, a gradient boosting algorithm for training prediction model ensembles to minimize decision regret. The dboost framework supports any convex optimization program that can be cast as convex quadratic cone program and gradient boosting is performed by implicit differentiation of a custom fixed-point mapping. To our knowledge, the dboost framework is the first general purpose implementation of gradient boosting to predict and optimize problems. Experimental results comparing with state-of-the-art SPO methods show that dboost can further reduce out-of-sample decision regret.  ( 2 min )
    Streamlined Variational Inference for Linear Mixed Models with Crossed Random Effects. (arXiv:1910.01799v3 [stat.ME] UPDATED)
    We derive streamlined mean field variational Bayes algorithms for fitting linear mixed models with crossed random effects. In the most general situation, where the dimensions of the crossed groups are arbitrarily large, streamlining is hindered by lack of sparseness in the underlying least squares system. Because of this fact we also consider a hierarchy of relaxations of the mean field product restriction. The least stringent product restriction delivers a high degree of inferential accuracy. However, this accuracy must be mitigated against its higher storage and computing demands. Faster sparse storage and computing alternatives are also provided, but come with the price of diminished inferential accuracy. This article provides full algorithmic details of three variational inference strategies, presents detailed empirical results on their pros and cons and, thus, guides the users on their choice of variational inference approach depending on the problem size and computing resources.  ( 2 min )
    Using Machine Learning for Particle Identification in ALICE. (arXiv:2204.06900v1 [nucl-ex])
    Particle identification (PID) is one of the main strengths of the ALICE experiment at the LHC. It is a crucial ingredient for detailed studies of the strongly interacting matter formed in ultrarelativistic heavy-ion collisions. ALICE provides PID information via various experimental techniques, allowing for the identification of particles over a broad momentum range (from around 100 MeV/$c$ to around 50 GeV/$c$). The main challenge is how to combine the information from various detectors effectively. Therefore, PID represents a model classification problem, which can be addressed using Machine Learning (ML) solutions. Moreover, the complexity of the detector and richness of the detection techniques make PID an interesting area of research also for the computer science community. In this work, we show the current status of the ML approach to PID in ALICE. We discuss the preliminary work with the Random Forest approach for the LHC Run 2 and a more advanced solution based on Domain Adaptation Neural Networks, including a proposal for its future implementation within the ALICE computing software for the upcoming LHC Run 3.  ( 2 min )

  • Open

    [D] What is the difference between channel-wise and self attention in this case?
    Example: I fed 32 feature maps of dimension 6x6x32 into a Squeeze and Excitation layer, which assigns a weight to each of my channel through a channel-wise attention mechanism. What is the difference between passing these 32 feature maps into a Hybrid Transformer Encoder with patch of dimension 6x6? (So 1 patch for each channel) As I understand it, channel attention says "which channel is important for the final prediction". While transformer (with self attention) tells us "where to focus our attention in a given context". Isn't that the same if the patches are the channels? Basically it tells us on which patch to focus, and if patch=channel then squeeze excitation = self attention ? submitted by /u/Rogitus [link] [comments]  ( 1 min )
    [P] Bounding.ai Launches New Marketplace for AI Labeled Data
    In a new announcement, Bounding.ai launched its marketplace for computer vision and AI teams to access training data easily. The platform is designed to empower individuals and small companies around the world to create and sell datasets that will be instantly accessible by any team in need of labeled data. Bounding.ai Launches New Marketplace for AI Labeled Data & $5,000 Prize submitted by /u/Freyr_AI [link] [comments]  ( 1 min )
    [D]Unsupervised classification of words/phrases?
    I have found most unsupervised text classification methods to be mostly suitable for classifying documents containing relatively large amounts of words/sentences. However, I have a dataset with entries containing only single words or phrases but not full sentences. The goal is to do unsupervised semantic classification on these words/phrases. Are there any existing algorithms for such a task? submitted by /u/Comprehensive-Egg707 [link] [comments]  ( 1 min )
    [N] Robot Arm Acts As "Hand And Eyes" of Language Model To Execute Real World Tasks With SayCan And Robotics At Google
    Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could in principle be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack contextual grounding, which makes it difficult to leverage them for decision making within a given real-world context. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide this grounding by means of pretrained behaviors, which are used to condition the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model’s “hands and eyes,” while the language model supplies high-level semantic knowledge about the task. We show how low-level tasks can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally extended instructions, while value functions associated with these tasks provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. Github: https://say-can.github.io/ Video of Robot Executing Commands: https://youtu.be/zOph99BjRqs?t=4 submitted by /u/SlightSituation [link] [comments]  ( 1 min )
    [D] AskScience AMA Series: We are seven leading scientists specializing in the intersection of machine learning and neuroscience. Ask Us Anything about computational neuroscience or science education!
    submitted by /u/blueneuronDOTnet [link] [comments]  ( 2 min )
    [D] How DALL-E 2 Actually Works
    Here's a video explaining the overall architecture of DALL-E 2 and how it actually works! Great overview for those who haven't had time to read the paper How does DALL-E 2 actually work? submitted by /u/SleekEagle [link] [comments]  ( 1 min )
    [N] Announcing the Learning on Graphs Conference!
    We think this new venue will be valuable for the Graph/Geometric Machine Learning community. Why? See our blogpost: https://michael-bronstein.medium.com/announcing-the-learning-on-graphs-conference-c63caed7347 The LoG Conference key facts: - Covers work broadly related to machine learning on graphs and geometry - Proceedings track published in PMLR - Also has a non-archival extended abstract track - Double blind review process on OpenReview - Top reviewers receive monetary rewards - First year: virtual December 9-12 2022, free to attend. Call for papers: https://logconference.github.io/cfp/ Stay updated via Twitter: https://twitter.com/LogConference Or LinkedIn: https://www.linkedin.com/company/log-conference Advisory board: Regina Barzilay (MIT), Xavier Bresson (NUS), Michael Bronstein (Oxford/Twitter), Stephan Günnemann (TUM), Stefanie Jegelka (MIT), Jure Leskovec (Stanford), Pietro Liò (Cambridge), Jian Tang (MILA/HEC Montreal), Jie Tang (Tsinghua), Petar Veličković (DeepMind), Soledad Villar (JHU), Marinka Zitnik (Harvard). Organizers: Yuanqi Du (DP Technology), Hannes Stärk (MIT), Derek Lim (MIT), Chaitanya Joshi (Cambridge), Andreea-Ioana Deac (Mila), Iulia Duta (Cambridge), Joshua Robinson (MIT). submitted by /u/Hannes-Stark [link] [comments]  ( 1 min )
    [P] Using Language Models to (probably) Read Faster
    I explored using language models to highlight more salient parts of a PDF file which hopefully help users to read faster. The main idea is to highlight only the characters which language model failed to predict. I have implemented this as an experimental feature in sioyek PDF reader. Here is a blog post explaining this in full detail: https://ahrm.github.io/jekyll/update/2022/04/14/using-languge-models-to-read-faster.html submitted by /u/highergraphic [link] [comments]  ( 2 min )
    [P] GANs and generating visually indeterminate images by error
    (Please correct me if I'm using the wrong flair/on the wrong sub) I'm currently working on a project that focuses on GANs and generative art, particularly images that concern visual indeterminacy. I trying to find papers/articles that discuss the development/application of (any kind of) GAN in which along the way or as a final result, images were generated that would be considered visually indeterminate. Specifically, research in which the objective was to generate images with clear, recognizable objects/scenes. In my mind I'm looking for articles in which the GAN architecture is discussed and in which ways what parts of it could've influenced the particular aspects of the incorrectly generated image. This probably wouldn't be the focus of any research but I was wondering if anyone has ever come across a discussion section in a GAN paper or could point me towards some areas or projects where I might find something that I could connect to my project. submitted by /u/mel4ncholi4 [link] [comments]  ( 1 min )
    [D] Ensemble methods (e.g. hard voting) in machine learning
    When should we consider ensemble methods in machine learning? Is there any statistical criteria using which we can decide, if doing ensemble may help? submitted by /u/flaubart9 [link] [comments]  ( 2 min )
    [D] How do you understand "Both FF𝐿 and FF𝑆 were 3-layer feed-forward networks with hidden dimensions of 1024 and 256, GeLU as the activation function and a dropout with probability 0.1 applied at their input."? (re-implementing https://arxiv.org/abs/2101.10587v1)
    Hi, I am re-implementing the paper "Low Resource Recognition and Linking of Biomedical Concepts from a Large Ontology" https://arxiv.org/abs/2101.10587v1 . They describe one part of their model as "3-layer feed-forward networks with hidden dimensions of 1024 and 256, GeLU as the activation function and a dropout with probability 0.1 applied at their input." For me that's not enough information to uniquely characterize the network, but maybe for someone with more experience the intended structure is obvious. The output should be a scalar, so i assume that it's something like: Dropout(0.1) -> Linear(, 1024) -> Linear(1024, 256) -> GELU -> Linear(256, 1) ? Or is the nonlinearity (GELU) normally applied after each step? submitted by /u/ldorigo [link] [comments]
    Why did SciNet not get more attention? [D]
    It seems to shatter previous benchmarks with a new, innovative architecture, yet it only has 3 citations and little to no attention from the community as far as I can see. Is it because time series forecasting is not very trendy right now or is there anything wrong with the paper? The paper in question: https://arxiv.org/pdf/2106.09305v2.pdf submitted by /u/vidul7498 [link] [comments]  ( 2 min )
    [D] Do you train and deploy models using just one framework or multiple frameworks at work?
    Hi, I'm the creator of Pinferencia. Currently I'm design new features to-do list. I want to know: Do you train and deploy models using just one framework or multiple frameworks at work? For example, use pytorch for training and deployment, or use tensorflow, pytorch for training, onnx for deployment. View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 2 min )
    [P] Extremely short and simple implementation of Denoising Diffusion Model, for educational purpose
    ​ Randomly sampled MNIST output. It's not good I know. Hi, I noticed there aren't that many simple implementation of DDPM, for example, using MNIST. I had to make a presentation for my workplace seminar, so I had to implement the simplified version of DDPM myself. The whole thing is under 200 lines of code https://github.com/cloneofsimo/minDiffusion This implementation has MANY missing details, such as Unet Models etc. I think it is worth taking a look, especially if you are interested in recent boom of diffusion models (such as Dalle 2) submitted by /u/cloneofsimo [link] [comments]  ( 1 min )
    [D] Kubernetes for ML - how are y'all doing it?
    Have been involved with Mesos since 2013, and Kubernetes almost since it's inception (and saw it win the "scheduler wars"). And now being used for pretty much _all_ container workloads, including ML training and inference. Since it was built in the image of Borg (where search indexers and map reduce jobs were preemptible, and serving search workloads had to be protected at all cost)[1], how is Kubernetes holding up for your current workflows? Are you using Kubeflow? metaflow? bespoke setup on top? [1] https://queue.acm.org/detail.cfm?id=2898444 submitted by /u/nqnielsen [link] [comments]  ( 2 min )
  • Open

    New to machine learning, want to simulate robotics in a 3d environment
    My employer makes significant use of robotic weld cells, and while working with the equipment I've noticed what seems to be room for improvement in the programming. This is purely a personal academic project, as I am quite curious on if machine learning could produce comparable or superior results to the human-made programming used at work. However, as such there will unfortunately be areas of vagueness because I need to stick to knowledge that is publicly available regarding their operations. I'm going to have to stick to more generic, publicly available reference material, and will not be able to share most, if any, of the end result. I would like to run simulations in a 3D environment, using machine learning to train a computer program to find the most efficient sequence of movements &…  ( 3 min )
    Does using a centralized critic always mean that the agents receive global observation?
    submitted by /u/No_Possibility_7588 [link] [comments]
    What algorithm would be suited for a “Just do it as good as you can” situation?
    I’m really new to RL so please bear with me if I’m making mistakes here, but I’m trying to make an environment that emulates a network of roads. The algorithm will need to generate a quick route between n destinations when n equals some number with an insane amount of permutations, like 30 for example. This is like emulating the destinations required by a mailman’s route on a map, and trying to find the fastest way to get to each one. The algorithms sequence of decisions will be choosing a node to travel to, while each node represents an street intersection or point where the street ends. By the time it’s traveled to every destination using the nodes, it’ll review the network of nodes it used and sum the distance between each one to get total distance of route. The goal is to get the total distance as small as possible. Is this realistic for a RL problem, or do I need to try to engineer some way to determine if every decision was either good or bad? Could I build a mathematical way to approximate the quickest route and then reward the RL algorithm by generating a better route than the mathematically approximated one? I could try rewarding the algorithm at each decision by whether it reduced the total distance required to any target it has yet to visit. I could try to mathematically make this more viable… what do y’all think,should I do something like that? Am I headed in the right direction? Thanks for any and all help! submitted by /u/professorDissociate [link] [comments]  ( 3 min )
    Where is env.nS for Frozen Lake in OpenAI Gym
    I am trying to run this: env4 = FrozenLakeEnv(map_name='4x4', is_slippery=False) env4.nS ​ I then get this error: 'FrozenLakeEnv' object has no attribute 'nS' ​ But I see it in the source code on line 151 and 152: https://github.com/openai/gym/blob/master/gym/envs/toy_text/frozen_lake.py ​ Edit: I'm trying to follow along with some tutorials online. Thank you for the help! submitted by /u/postdoc403b [link] [comments]  ( 1 min )
    Getting max/min action in DDPG and TD3
    I am using DDPG for a custom environment. My reward is positive (the sum-rate in a communication system). My problem is that I get the max or min action after a few training steps and it saturates with a non-optimized solution. How can I address this problem? I tried redesigning my reward to include positive and negative values but it didn’t work. I read that some people are using reward scaling. What is it and how would I scale it? I mean is there a specific method? I couldn’t find enough resources on that. Any help is much appreciated! submitted by /u/alicefaisal [link] [comments]  ( 1 min )
    Question about pseudocodes
    Hi I'm redoing all the RL algorithms in python, to better understanding them. I'm mostly following Sutton and Barto but the pseudo code there is often hard to follow. Do you know any other place where I can look at? submitted by /u/New_neanderthal [link] [comments]  ( 1 min )
    Industry use of reinforcement learning
    I have been studying RL now for 18 months as a goal to get a job in it. Yet when I look at jobs, I see very seldom postings about it. I am wondering why is it the case ? From my current understanding I could think of dozens of applications with huge potential gains. It feel like an untapped potential. Or am I missing something ? What do you think is the big obstacle to wider adoption to RL ? Do you think it overlaps with classical control at the moment and is not justified ? submitted by /u/Ouassimf [link] [comments]  ( 4 min )
    Comparing Default VS Custom Reward Function for Optimal Health Management of a DeepRL Agent Playing Tekken
    submitted by /u/DIAMBRA_AIArena [link] [comments]  ( 2 min )
  • Open

    Fine-tune and deploy a Wav2Vec2 model for speech recognition with Hugging Face and Amazon SageMaker
    Automatic speech recognition (ASR) is a commonly used machine learning (ML) technology in our daily lives and business scenarios. Applications such as voice-controlled assistants like Alexa and Siri, and voice-to-text applications like automatic subtitling for videos and transcribing meetings, are all powered by this technology. These applications take audio clips as input and convert speech […]  ( 11 min )
    Build a virtual credit approval agent with Amazon Lex, Amazon Textract, and Amazon Connect
    Banking and financial institutions review thousands of credit applications per week. The credit approval process requires financial organizations to invest time and resources in reviewing documents like W2s, bank statements, and utility bills. The overall experience can be costly for the organization. At the same time, organizations have to consider borrowers, who are waiting for […]  ( 8 min )
  • Open

    AWS Cloud Migration: All You Need to Know
    Businesses today face myriad challenges, some of which are successfully addressed with help from cloud computing. This is where AWS cloud migration which promises to be a boon for businesses grappling with a sudden increase in traffic or for those who are looking for accelerated app deployment. It is also handy for cautious businesses that… Read More »AWS Cloud Migration: All You Need to Know The post AWS Cloud Migration: All You Need to Know appeared first on Data Science Central.  ( 3 min )
  • Open

    Web Crawling in Python
    In the old days, it was a tedious job to collect data, and sometimes very expensive. Machine learning projects cannot […] The post Web Crawling in Python appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    Kamikaze Drones in Russia’s War Against Ukraine Point to Future "Killer Robots"
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    AI News | Breakthrough AI Robot Arm Understanding From Google | OpenAI DALL-E 2 | AI Edge Computing In Space
    submitted by /u/getrich_or_diemining [link] [comments]  ( 1 min )
    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    My first attempt at machine learning. I made a cool chatbot 😎
    I made a self learning conversational chatbot in ReactJS. It does nothing but reply to user messages and only understands text, for now 😄 https://xalen.netlify.app What do you think? Yea or Nay? submitted by /u/GameTide [link] [comments]  ( 1 min )
    Music video about AI
    submitted by /u/starlightinspace [link] [comments]
    Computational reasoning about incomputability, infinity, truth etc (Gödel, Tarski,...)
    So I would be curious about the theoretical foundations how to make sense of higher-level abstract reasoning like reasoning about infinities, incomputability, truth (which we know cannot be defined due to Tarski) in the field of artificial intelligence. It seems due to Gödel-like constructions you are forced into inconsistent systems of reasoning when operating within a computable system. But those prove everything and "nothing", so as far as I understand it, it kind of upends the whole system of reason that the notion of artificial intelligence (and correct functioning of it) is based in. Personally due to this I don't see that the notion in the title it is a particularly coherent notion, which means there is somewhat strong limits on what (computable) AI will be able to do. But I would be curious how people that think otherwise (which seem most in the AI community?) approach this. Would you say somehow inconsistency can be avoided, or that despite inconsistency you can get reliably correct results? submitted by /u/bejaq [link] [comments]  ( 6 min )
    The best explanation of What is Machine Learning and How it works? MUST WATCH
    submitted by /u/mr-minion [link] [comments]
    Artificial Nightmares: Hills Have Eyes || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    DALL-E (Zero-Shot Text-to-Image Generation) -PART(1/2)
    OpenAI released DALL E2 in the last week, this system is basically have a capability of generating an image from a text description. Some of the results were truly amazing. In this blog, I tried to discuss the ideas around DALL-E (version 1) . DALL-E consist of two main components d-VAE(discrete-Variational Auto Encoder) and Auto-regressive transformer. In Part-1 I focused on d-VAE part where I tried to talk about basic VAE and it's ELBO formulation, VQ-VAE eventually that leads to d-VAE. It's reconstruction loss is formulated from Logit Laplcae (bounded) unlike typical L1 or L2. Overall this part explains about how a discrete vector(token) can be generated for an input image. submitted by /u/rakshith291 [link] [comments]  ( 1 min )
    The best explanation of What is Machine Learning and How it works? MUST WATCH
    submitted by /u/mr-minion [link] [comments]  ( 1 min )
    how do I fix this (I'm trying to predict sine (the blue dots are it's guesses and the white line is the "correct" answer)
    submitted by /u/-i-hate-this-place- [link] [comments]  ( 2 min )
    Neuroevolution of Augmenting Topologies Course
    Hey all, There's a new course on the Neuroevolution of Augmenting Topologies (NEAT) algorithm. It's a niche algorithm, but uses some very interesting mechanisms to train/evolve simple irregular neural networks. Thought some of you may be interested. submitted by /u/Cogitarius [link] [comments]
  • Open

    Startup Transforms Meeting Notes With Time-Saving Features
    Gil Makleff and Artem Koren are developing AI for meeting transcripts, creating time-savers like shareable highlights of the text that is often TL;DR (too long; didn’t read). The Sembly founders conceived the idea after years of working in enterprise operational consulting at UMT Consulting Group, which was acquired by Ernst & Young. “We had an Read article > The post Startup Transforms Meeting Notes With Time-Saving Features appeared first on NVIDIA Blog.  ( 3 min )
    A Night to Behold: Researchers Use Deep Learning to Bring Color to Night Vision
    Talk about a bright idea. A team of scientists has used GPU-accelerated deep learning to show how color can be brought to night-vision systems.  In a paper published this week in the journal PLOS One, a team of researchers at the University of California, Irvine led by Professor Pierre Baldi and Dr. Andrew Browne, describes how Read article > The post A Night to Behold: Researchers Use Deep Learning to Bring Color to Night Vision appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Data Scientists vs. BI Developer: What’s the Difference?
    Here’s the truth.  ( 1 min )
  • Open

    Learning to think critically about machine learning
    A multidisciplinary team of graduate students helps infuse ethical computing content into MIT’s largest machine learning course.  ( 7 min )

  • Open

    [D] LOOCV
    I'm working with a small dataset (~400 labeled data). I plan to use logistic regression. Does it make sense/is it necessary to have a hold-out validation set along with doing Leave-one out cross validation (LOOCV) (E.g. leave 20% out, and train model on LOOCV)? submitted by /u/yontbont1 [link] [comments]  ( 1 min )
    [D] What's the probability distribution of the Feature importances in an ensemble method?
    Assuming feature importance as defined by mean decrease in impurities. I'm curious if there are any studies about their distribution. I'm thinking about using a statistical test to check if a feature is relevant or not, all I can find is using the standard deviation as a measurement of noise. Additionally I imagine if we can give the probability of one feature being more relevant than another given their feature importances submitted by /u/FellowOfHorses [link] [comments]  ( 1 min )
    [Discussion] Collecting Feedback for FinRL: Financial Reinforcement Learning
    Dear all, As a creator of the open-source FinRL project, I would like to welcome all kinds of feedback regarding financial reinforcement learning, especially about how to improve the open-source project FinRL. After several years of development and maintenance, we have passed the phase of caring about #stars, now we care more about #downloads, also Wall Street's adoption. Appreciate your feedback and sharing! Previously when we exposed our message on Reddit, the community was not very supportive about open-source projects' "advertisements". Maybe it consumed public attention and raised bad feelings. Therefore, this time we created a reddit sub-channel for FinRL-related discussions, available at: https://www.reddit.com/r/AI4Finance_FinRL/ Best, Yang submitted by /u/Character-Meat-9176 [link] [comments]  ( 1 min )
    [D] What fun things in ML would you give a presentation on?
    If you had 30 minutes to present something fun and exciting to a semi-technical audience, what would you talk about on Machine Learning that would gain interest and engagement? submitted by /u/aero_gsr [link] [comments]  ( 1 min )
    [D] Evaluation and iteration for production models - how?
    How do you evaluate and improve your models in production (particularly for complex modalities like text/vision/audio)? Good models are hard In my experience from managing our CNN-based text classification & NER model at a small media analytics startup, evaluating and improving models is a mess. Our domain is fairly niche and diverse, so getting enough training data has been challenging and I mix in custom synthetics & augmentations (which can cause weird model artifacts if you're not careful). It takes a lot of time to discover tricky failure cases by either 1) observing production traffic or 2) probing manually, and then it takes even more time to get the right data to improve model behavior. Are good models hard? What's your approach to model evaluation & targeted improvement? Are there any known best practices? I'm a bit at a loss here. As mentioned, I'm specifically interested in others who have deep models as an important part of their product or pipeline across any task or modality. More particularly: How wrong is your model? How do you test it? How would you know about errors before and after it's deployed? How much of your time do you spend on iterating on your models? For what kind of issue? Which aspects are most useful to you for improving model performance and reducing critical errors? Maybe I'll take some of the more general ideas from my work and build them out into an evaluation & iteration framework. It's currently a hybrid web of synthetic, interactive/probing and classical approaches. Or maybe there is some approach/library that makes iteration easier without me having to do anything :) submitted by /u/flotothemoon [link] [comments]  ( 2 min )
    [D] Trace norm in KFAC paper for regularization
    Hi, I doubt that the trace norm of the Kronecker product is mistaken in the KFAC paper (https://arxiv.org/abs/1503.05671). Shouldn't the division in the blue mark be replaced by multiplication? https://preview.redd.it/quoxpzubfgt81.png?width=1241&format=png&auto=webp&s=19e7b60628302f3cb37ba42944088d89d7a7bd28 submitted by /u/Cautious_Proposal132 [link] [comments]  ( 1 min )
    [D] To what extent can Rust be used for Machine Learning?
    I recently saw that some parts of HuggingFace ecosystem use Rust under the hood, and HF is a large ecosystem. I've also heard from some of my friends that they had to learn Rust as a first thing in an ML company (it's their first job so they couldn't explain to me exactly why). My questions are: What are pros and cons over Python? Are there any good frameworks in Rust for ML? Are there a decent community & documentation for Rust? Is learning it a fun experience? Is it used only for deployment? The reason I'm asking this is that I really love to learn by doing. And so, if I engaged in learning a bit of Rust for ML purposes, would I be able to create something ML-like right of the bat? It can be something as simple as MNIST classifier Take note that I don't know anything about Rust, so these questions might seem noob-like. But I believe that the answers can be of help to others as well. submitted by /u/Icy_Fisherman7187 [link] [comments]  ( 5 min )
    [D] Tips for using Ensemble Learning with a small dataset
    I have started to look into using an ensemble of relatively shallow MLPs to predict using a small dataset (~100 training samples). I was looking specifically at bagging (bootstrap aggregation) as a possibility of improving prediction accuracy. I was curious if there were any heuristics for how many models to include in a bagging ensemble? Also, more generally, am I on the correct path, or is there a better direction given my situation? A different ensemble technique, or a different path all together? Any advice would be appreciated. submitted by /u/Fritos121 [link] [comments]  ( 1 min )
    [D] What JAX NN library to use?
    I've been exploring the jax ecosystem and its many neural network libraries but I can't seem to settle on one. The main 5 which i am considering are Trax, Objax, Equinox, Flax, and Elegy, however I would like to hear which jax NN lib you use and why. submitted by /u/Southern-Trip-1102 [link] [comments]
  • Open

    Locked-image Tuning: Adding Language Understanding to Image Models
    Posted by Andreas Steiner and Basil Mustafa, Research Software Engineers at Google Research, Brain team The ability to classify images into categories has been transformed by deep learning. It has also been significantly accelerated by transfer learning, whereby models are first pre-trained on large datasets, like ImageNet, to learn visual representations that are then transferred via fine-tuning to a new task with less data (e.g., classifying animals). Previous works such as BiT and ViT employed these methods to achieve state-of-the-art performance on a wide range of classification tasks, such as the VTAB benchmark. However, fine-tuning has some downsides: though pre-training is done only once, fine-tuning is necessary on every new dataset for which task-specific data is needed. Multimo…  ( 7 min )
  • Open

    questions to ask an AI?
    i recently played a game called tacoma that had a focus on AI and in the game there was a guide for AI that showed 4 hypotheticals to ask an AI to check it's morality and it got me thinking how useful that would be for a real self-aware intelligence so i want to make a list of questions/hypotheticals to ask AGIs if you had to interview a recently created sentient AI what questions or hypotheticals would you give it to gauge it's morality, intelligence, creativity, emotion etc.? submitted by /u/neonvolta [link] [comments]  ( 1 min )
    YouTuber Meets His Creepy Robot Double and Freaks Out
    submitted by /u/estasfuera [link] [comments]
    Reference request for applications of time to ai
    Does anyone know of any AI papers, books articles etc that discuss using a sense of time to develop AI, (especially real world time)? I've come across papers that discuss how having a sense of time seems to play a role in animal cognition (e.g. temporal cognition), and I'm curious to what extent this has influenced the development of AI. Thanks in advance submitted by /u/patterntheoryacc [link] [comments]  ( 1 min )
    IBM Data Science and AI Programs on Coursera Free for 30 Days
    submitted by /u/awsconsultant [link] [comments]
    Are you aware of these AI Ethical Challenges?
    submitted by /u/JencyJane [link] [comments]
    Synthetic²: Can AI Be A Powerful Force For Creation? | SiGMA/AGS UAE 2022
    submitted by /u/thedyezwfl [link] [comments]
    Google finance chief: "We automate everything that can be automated"
    submitted by /u/much_successes [link] [comments]
    Free Webinar series | Automated CV Pipelines | Instance Classification
    Automated CV Pipelines 3rd part is open for registration. It will be covering the methods of streamlining instance classification. If you are interested to check out, here is the link to register. submitted by /u/WeekendClassic [link] [comments]
    "My A.I. writes music better than humans. World-class education in A.I. + music -> decades of work -> censored from Facebook, Twitter, soon to be downvoted or unfairly-banned from Reddit. It's making the most beautiful music I've ever heard, and society despises it."
    Thirty years it's taken me, A.I. that is not just as good as humans but better than humans at composing music: https://i.imgur.com/hReXJq1.png It passes the Turing Test, and it is also a revolution in the field of music in and of itself. In the meantime, no one has said anything nice to me in thirty years; just insults. I would feel dumb rewarding humanity with my creation; it would send the wrong message; it would affirm their bad behavior. Garbage species. Low IQ. submitted by /u/PussyFiller2022 [link] [comments]  ( 1 min )
  • Open

    PPO with one worker always picking the best action?
    If I use PPO with distributed workers, and one of the workers always picks the best action, would that skew the PPO algorithm? It might perform a tad slower, but would it factually introduce wrong math? Perhaps because the PPO optimization requires that all actions are taking proportional to their probabilities? Or would it (mathematically) not matter? submitted by /u/tmuxed [link] [comments]  ( 1 min )
    Feedback Collection for FinRL: Financial Reinforcement Learning
    Dear all, As a creator of the open-source FinRL project, I would like to welcome all kinds of feedback regarding financial reinforcement learning, especially about how to improve the open-source project FinRL. After several years of development and maintenance, we have passed the phase of caring about #stars, now we care more about #downloads, also Wall Street's adoption. Appreciate your feedback and sharing! Previously when we exposed our message on Reddit, the community was not very supportive about open-source projects' "advertisements". Maybe it consumed public attention and raised bad feelings. Therefore, this time we created a reddit sub-channel for FinRL-related discussions, available at: https://www.reddit.com/r/AI4Finance_FinRL/ Best, Yang submitted by /u/Character-Meat-9176 [link] [comments]  ( 1 min )
    Is a steady linear increase in average reward during training too good to be true? Are there any common pitfalls?
    submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Determine Gridworld values with no probability
    I am learning Reinforcement learning for games following Gridworld examples. Apologies in advance if this is a basic question, very new to reinforcement learning. I am slightly confused in scenarios where probability of moving up, down, left and right are not provided or stated. In this scenario, I assume we assume the optimal policy and therefore, you would apply the Bellman equation as: V(s)=maxa(R(s,a)+γV(s′)) Cost for any movement is 0 and an agent can choose to terminate at a numbered grid to collect a reward amount of the grid number. This is why my square closest to the reward takes in the value 8 since it will terminate with the action to the next state to collect the reward. Would this be the correct way to determine the value for the surrounding grid squares? https://preview.redd.it/s9l0ok4kbgt81.png?width=806&format=png&auto=webp&s=dfb50450001541b0569d0361fd04a73daa29f222 submitted by /u/Artezian [link] [comments]  ( 2 min )
  • Open

    Three from MIT awarded 2022 Paul and Daisy Soros Fellowships for New Americans
    Fellowship funds graduate studies for outstanding immigrants and children of immigrants.  ( 6 min )
  • Open

    Latest Research From Stanford Introduces ‘Domino’: A Python Tool for Identifying and Describing Underperforming Slices in Machine Learning Models
    Machine learning and Artificial Intelligence models have gained promising results in recent years. The major factor behind their success is the availability and development of vast datasets. However, regardless of how many terabytes of data you have or how skilled you are at data science, machine learning models will be useless and even dangerous if you can’t make sense of data records. A slice is a collection of data samples with a common feature. For example, in a picture dataset, photographs of antique vehicles make up a slice. When a model’s performance on the data samples in a slice is significantly lower than its overall performance, the slice is considered underperforming. Deploying models underperforming on crucial data slices could seriously harm safety and fairness. For instance, models trained to detect collapsed lungs in chest X-rays generally make predictions based on the presence of chest drains, a common therapeutic device. As a result, computer models typically fail to detect collapsed lungs in images without chest drains, a critical data slice in which inaccurate negative predictions could be catastrophic. Not many studies have considered underperforming slices during model evaluation. Researchers believe that knowing which slices their models underperform would help practitioners not just make better decisions regarding model deployment but also improve model robustness by upgrading the training dataset or utilizing robust optimization strategies. Detecting slices is challenging because the “hidden” data slices are linked by a notion that isn’t easily derived from unstructured inputs or labeled in metadata (e.g., images, video, time-series data). Continue reading the summary Paper: https://arxiv.org/pdf/2203.14960.pdf Article: http://ai.stanford.edu/blog/domino/ Github: https://github.com/HazyResearch/domino submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    NN from Scratch: #3 Forward propagation | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    AI-generated easter eggs
    How would AI decorate an easter egg? I've tried this before by training an image-generating model exclusively on pictures of easter eggs I decorated (they came out plain, if a bit wobbly). I decided to see what I would get using a model based on CLIP, which has  ( 3 min )
    Bonus: What does the x-ray of an Easter egg look like?
    AI Weirdness: the strange side of machine learning  ( 1 min )
  • Open

    R-Learning AI self-taking over processes
    An inside look at how REINFORCEMENT learning, without past reference, extracts “optimal” decisions through simple interaction …  ( 10 min )
  • Open

    GFN Thursday Gears Up With More Electronic Arts Games on GeForce NOW
    This GFN Thursday delivers more gr-EA-t games as two new titles from Electronic Arts join the GeForce NOW library. Gamers can now enjoy Need for Speed HEAT  and Plants vs. Zombies Garden Warfare 2 streaming from GeForce NOW to underpowered PCs, Macs, Chromebooks, SHIELD TV and mobile devices. It’s all part of the eight  total Read article > The post GFN Thursday Gears Up With More Electronic Arts Games on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    How AI is Changing Digital Marketing
    What is Artificial Intelligence? Oxford Languages defines AI as the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages. For those of us working in the realm of digital marketing, the impact has become even more clear over… Read More »How AI is Changing Digital Marketing The post How AI is Changing Digital Marketing appeared first on Data Science Central.  ( 4 min )
    AI For Compliance: What, Why, How
    With the constant rise and use of technology, Artificial Intelligence (AI) has become a great companion to compliance. Compliance is one of the biggest playing fields and plays a pivotal role in banking institutions. It aims to identify, diminish, and manage risks such as insider trading, spoofing attacks, exploitation of the market, front-running, and more by… Read More »AI For Compliance: What, Why, How The post AI For Compliance: What, Why, How appeared first on Data Science Central.  ( 9 min )
    Benefits of Data Governance and Compliance
    While data compliance is the practice of organizations ensuring that all sensitive data is managed and organized in a way that enables them to meet their business rules alongside legal and governmental regulations, data governance involves the process of managing organizational data’s usability, security, availability, and quality using the internally set rules and policies. Data… Read More »Benefits of Data Governance and Compliance The post Benefits of Data Governance and Compliance appeared first on Data Science Central.  ( 3 min )
    How To Write A Technical Dissertation
    Technical dissertation writing sometimes seems impossible until it is done. A dissertation is among the lengthiest tasks that can take months to get completed. Thus, it exhausts students, but there is no way around it. It is worth more than about 60 credits in a thesis-based degree. Moreover, gathering proper knowledge and top guidelines about… Read More »How To Write A Technical Dissertation The post How To Write A Technical Dissertation appeared first on Data Science Central.  ( 5 min )
    Why Data Engineers are in Greater Demand than Data Scientists
    Globally, many think that data scientist is the best job after Harvard declared it to be one of the hottest jobs of the decade.  And since then, many have been choosing it as their career path. But the role of a data engineer is as important as the data scientist is, because if a data… Read More »Why Data Engineers are in Greater Demand than Data Scientists The post Why Data Engineers are in Greater Demand than Data Scientists appeared first on Data Science Central.  ( 3 min )
  • Open

    When to Go, and When to Explore: The Benefit of Post-Exploration in Intrinsic Motivation. (arXiv:2203.16311v2 [cs.LG] UPDATED)
    Go-Explore achieved breakthrough performance on challenging reinforcement learning (RL) tasks with sparse rewards. The key insight of Go-Explore was that successful exploration requires an agent to first return to an interesting state ('Go'), and only then explore into unknown terrain ('Explore'). We refer to such exploration after a goal is reached as 'post-exploration'. In this paper we present a systematic study of post-exploration, answering open questions that the Go-Explore paper did not answer yet. First, we study the isolated potential of post-exploration, by turning it on and off within the same algorithm. Subsequently, we introduce new methodology to adaptively decide when to post-explore and for how long to post-explore. Experiments on a range of MiniGrid environments show that post-exploration indeed boosts performance (with a bigger impact than tuning regular exploration parameters), and this effect is further enhanced by adaptively deciding when and for how long to post-explore. In short, our work identifies adaptive post-exploration as a promising direction for RL exploration research.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v2 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
    Estimating permeability of 3D micro-CT images by physics-informed CNNs based on DNS. (arXiv:2109.01818v2 [cs.LG] UPDATED)
    In recent years, convolutional neural networks (CNNs) have experienced an increasing interest in their ability to perform a fast approximation of effective hydrodynamic parameters in porous media research and applications. This paper presents a novel methodology for permeability prediction from micro-CT scans of geological rock samples. The training data set for CNNs dedicated to permeability prediction consists of permeability labels that are typically generated by classical lattice Boltzmann methods (LBM) that simulate the flow through the pore space of the segmented image data. We instead perform direct numerical simulation (DNS) by solving the stationary Stokes equation in an efficient and distributed-parallel manner. As such, we circumvent the convergence issues of LBM that frequently are observed on complex pore geometries, and therefore, improve the generality and accuracy of our training data set. Using the DNS-computed permeabilities, a physics-informed CNN PhyCNN) is trained by additionally providing a tailored characteristic quantity of the pore space. More precisely, by exploiting the connection to flow problems on a graph representation of the pore space, additional information about confined structures is provided to the network in terms of the maximum flow value, which is the key innovative component of our workflow. The robustness of this approach is reflected by very high prediction accuracy, which is observed for a variety of sandstone samples from archetypal rock formations.
    Highly efficient reliability analysis of anisotropic heterogeneous slopes: Machine Learning aided Monte Carlo method. (arXiv:2204.06098v1 [cs.LG])
    Machine Learning (ML) algorithms are increasingly used as surrogate models to increase the efficiency of stochastic reliability analyses in geotechnical engineering. This paper presents a highly efficient ML aided reliability technique that is able to accurately predict the results of a Monte Carlo (MC) reliability study, and yet performs 500 times faster. A complete MC reliability analysis on anisotropic heterogeneous slopes consisting of 120,000 simulated samples is conducted in parallel to the proposed ML aided stochastic technique. Comparing the results of the complete MC study and the proposed ML aided technique, the expected errors of the proposed method are realistically examined. Circumventing the time-consuming computation of factors of safety for the training datasets, the proposed technique is more efficient than previous methods. Different ML models, including Random Forest (RF), Support Vector Machine (SVM) and Artificial Neural Networks (ANN) are presented, optimised and compared. The effects of the size and type of training and testing datasets are discussed. The expected errors of the ML predicted probability of failure are characterised by different levels of soil heterogeneity and anisotropy. Using only 1% of MC samples to train ML surrogate models, the proposed technique can accurately predict the probability of failure with mean errors limited to 0.7%. The proposed technique reduces the computational time required for our study from 306 days to only 14 hours, providing 500 times higher efficiency.
    OntoProtein: Protein Pretraining With Gene Ontology Embedding. (arXiv:2201.11147v3 [q-bio.BM] UPDATED)
    Self-supervised protein language models have proved their effectiveness in learning the proteins representations. With the increasing computational power, current protein language models pre-trained with millions of diverse sequences can advance the parameter scale from million-level to billion-level and achieve remarkable improvement. However, those prevailing approaches rarely consider incorporating knowledge graphs (KGs), which can provide rich structured knowledge facts for better protein representations. We argue that informative biology knowledge in KGs can enhance protein representation with external knowledge. In this work, we propose OntoProtein, the first general framework that makes use of structure in GO (Gene Ontology) into protein pre-training models. We construct a novel large-scale knowledge graph that consists of GO and its related proteins, and gene annotation texts or protein sequences describe all nodes in the graph. We propose novel contrastive learning with knowledge-aware negative sampling to jointly optimize the knowledge graph and protein embedding during pre-training. Experimental results show that OntoProtein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. Code and datasets are available in https://github.com/zjunlp/OntoProtein.
    A streamable large-scale clinical EEG dataset for Deep Learning. (arXiv:2203.02552v2 [cs.LG] UPDATED)
    Deep Learning has revolutionized various fields, including Computer Vision, Natural Language Processing, as well as Biomedical research. Within the field of neuroscience, specifically in electrophysiological neuroimaging, researchers are starting to explore leveraging deep learning to make predictions on their data without extensive feature engineering. The availability of large-scale datasets is a crucial aspect of allowing the experimentation of Deep Learning models. We are publishing the first large-scale clinical EEG dataset that simplifies data access and management for Deep Learning. This dataset contains eyes-closed EEG data prepared from a collection of 1,574 juvenile participants from the Healthy Brain Network. We demonstrate a use case integrating this framework, and discuss why providing such neuroinformatics infrastructure to the community is critical for future scientific discoveries.
    Research on Intellectual Property Resource Profile and Evolution Law. (arXiv:2204.06221v1 [cs.DL])
    In the era of big data, intellectual property-oriented scientific and technological resources show the trend of large data scale, high information density and low value density, which brings severe challenges to the effective use of intellectual property resources, and the demand for mining hidden information in intellectual property is increasing. This makes intellectual property-oriented science and technology resource portraits and analysis of evolution become the current research hotspot. This paper sorts out the construction method of intellectual property resource intellectual portrait and its pre-work property entity extraction and entity completion from the aspects of algorithm classification and general process, and directions for improvement of future methods.
    Adjacency constraint for efficient hierarchical reinforcement learning. (arXiv:2111.00213v3 [cs.LG] UPDATED)
    Goal-conditioned Hierarchical Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. Searching in a large goal space poses difficulty for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that in a deterministic Markov Decision Process (MDP), the proposed adjacency constraint preserves the optimal hierarchical policy, while in a stochastic MDP the adjacency constraint induces a bounded state-value suboptimality determined by the MDP's transition structure. We further show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks including challenging simulated robot locomotion and manipulation tasks show that incorporating the adjacency constraint significantly boosts the performance of state-of-the-art goal-conditioned HRL approaches.
    Learning from All Vehicles. (arXiv:2203.11934v2 [cs.RO] UPDATED)
    In this paper, we present a system to train driving policies from experiences collected not just from the ego-vehicle, but all vehicles that it observes. This system uses the behaviors of other agents to create more diverse driving scenarios without collecting additional data. The main difficulty in learning from other vehicles is that there is no sensor information. We use a set of supervisory tasks to learn an intermediate representation that is invariant to the viewpoint of the controlling vehicle. This not only provides a richer signal at training time but also allows more complex reasoning during inference. Learning how all vehicles drive helps predict their behavior at test time and can avoid collisions. We evaluate this system in closed-loop driving simulations. Our system outperforms all prior methods on the public CARLA Leaderboard by a wide margin, improving driving score by 25 and route completion rate by 24 points. Our method won the 2021 CARLA Autonomous Driving challenge. Code and data are available at https://github.com/dotchen/LAV.
    Optimal Membership Inference Bounds for Adaptive Composition of Sampled Gaussian Mechanisms. (arXiv:2204.06106v1 [cs.CR])
    Given a trained model and a data sample, membership-inference (MI) attacks predict whether the sample was in the model's training set. A common countermeasure against MI attacks is to utilize differential privacy (DP) during model training to mask the presence of individual examples. While this use of DP is a principled approach to limit the efficacy of MI attacks, there is a gap between the bounds provided by DP and the empirical performance of MI attacks. In this paper, we derive bounds for the \textit{advantage} of an adversary mounting a MI attack, and demonstrate tightness for the widely-used Gaussian mechanism. We further show bounds on the \textit{confidence} of MI attacks. Our bounds are much stronger than those obtained by DP analysis. For example, analyzing a setting of DP-SGD with $\epsilon=4$ would obtain an upper bound on the advantage of $\approx0.36$ based on our analyses, while getting bound of $\approx 0.97$ using the analysis of previous work that convert $\epsilon$ to membership inference bounds. Finally, using our analysis, we provide MI metrics for models trained on CIFAR10 dataset. To the best of our knowledge, our analysis provides the state-of-the-art membership inference bounds for the privacy.
    LDPC codes: comparing cluster graphs to factor graphs. (arXiv:2204.06350v1 [cs.IT])
    We present a comparison study between a cluster and factor graph representation of LDPC codes. In probabilistic graphical models, cluster graphs retain useful dependence between random variables during inference, which are advantageous in terms of computational cost, convergence speed, and accuracy of marginal probabilities. This study investigates these benefits in the context of LDPC codes and shows that a cluster graph representation outperforms the traditional factor graph representation.
    Deep Learning-based Framework for Automatic Cranial Defect Reconstruction and Implant Modeling. (arXiv:2204.06310v1 [eess.IV])
    The goal of this work is to propose a robust, fast, and fully automatic method for personalized cranial defect reconstruction and implant modeling. We propose a two-step deep learning-based method using a modified U-Net architecture to perform the defect reconstruction, and a dedicated iterative procedure to improve the implant geometry, followed by automatic generation of models ready for 3-D printing. We propose a cross-case augmentation based on imperfect image registration combining cases from different datasets. We perform ablation studies regarding different augmentation strategies and compare them to other state-of-the-art methods. We evaluate the method on three datasets introduced during the AutoImplant 2021 challenge, organized jointly with the MICCAI conference. We perform the quantitative evaluation using the Dice and boundary Dice coefficients, and the Hausdorff distance. The average Dice coefficient, boundary Dice coefficient, and the 95th percentile of Hausdorff distance are 0.91, 0.94, and 1.53 mm respectively. We perform an additional qualitative evaluation by 3-D printing and visualization in mixed reality to confirm the implant's usefulness. We propose a complete pipeline that enables one to create the cranial implant model ready for 3-D printing. The described method is a greatly extended version of the method that scored 1st place in all AutoImplant 2021 challenge tasks. We freely release the source code, that together with the open datasets, makes the results fully reproducible. The automatic reconstruction of cranial defects may enable manufacturing personalized implants in a significantly shorter time, possibly allowing one to perform the 3-D printing process directly during a given intervention. Moreover, we show the usability of the defect reconstruction in mixed reality that may further reduce the surgery time.
    Distributionally Robust Models with Parametric Likelihood Ratios. (arXiv:2204.06340v1 [cs.LG])
    As machine learning models are deployed ever more broadly, it becomes increasingly important that they are not only able to perform well on their training distribution, but also yield accurate predictions when confronted with distribution shift. The Distributionally Robust Optimization (DRO) framework proposes to address this issue by training models to minimize their expected risk under a collection of distributions, to imitate test-time shifts. This is most commonly achieved by instance-level re-weighting of the training objective to emulate the likelihood ratio with possible test distributions, which allows for estimating their empirical risk via importance sampling (assuming that they are subpopulations of the training distribution). However, re-weighting schemes in the literature are usually limited due to the difficulty of keeping the optimization problem tractable and the complexity of enforcing normalization constraints. In this paper, we show that three simple ideas -- mini-batch level normalization, a KL penalty and simultaneous gradient updates -- allow us to train models with DRO using a broader class of parametric likelihood ratios. In a series of experiments on both image and text classification benchmarks, we find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches, and that the method performs reliably well with little hyper-parameter tuning. Code to reproduce our experiments can be found at https://github.com/pmichel31415/P-DRO.
    Learning multiobjective rough terrain traversability. (arXiv:2203.16354v2 [cs.RO] UPDATED)
    We present a method that uses high-resolution topography data of rough terrain, and ground vehicle simulation, to predict traversability. Traversability is expressed as three independent measures: the ability to traverse the terrain at a target speed, energy consumption, and acceleration. The measures are continuous and reflect different objectives for planning that go beyond binary classification. A deep neural network is trained to predict the traversability measures from the local heightmap and target speed. To produce training data, we use an articulated vehicle with wheeled bogie suspensions and procedurally generated terrains. We evaluate the model on laser-scanned forest terrains, previously unseen by the model. The model predicts traversability with an accuracy of 90%. Predictions rely on features from the high-dimensional terrain data that surpass local roughness and slope relative to the heading. Correlations show that the three traversability measures are complementary to each other. With an inference speed 3000 times faster than the ground truth simulation and trivially parallelizable, the model is well suited for traversability analysis and optimal path planning over large areas.
    DL4SciVis: A State-of-the-Art Survey on Deep Learning for Scientific Visualization. (arXiv:2204.06504v1 [cs.GR])
    Since 2016, we have witnessed the tremendous growth of artificial intelligence+visualization (AI+VIS) research. However, existing survey papers on AI+VIS focus on visual analytics and information visualization, not scientific visualization (SciVis). In this paper, we survey related deep learning (DL) works in SciVis, specifically in the direction of DL4SciVis: designing DL solutions for solving SciVis problems. To stay focused, we primarily consider works that handle scalar and vector field data but exclude mesh data. We classify and discuss these works along six dimensions: domain setting, research task, learning type, network architecture, loss function, and evaluation metric. The paper concludes with a discussion of the remaining gaps to fill along the discussed dimensions and the grand challenges we need to tackle as a community. This state-of-the-art survey guides SciVis researchers in gaining an overview of this emerging topic and points out future directions to grow this research.
    Scalable Training of Language Models using JAX pjit and TPUv4. (arXiv:2204.06514v1 [cs.LG])
    Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this technical report, we explore challenges and design decisions associated with developing a scalable training framework, and present a quantitative analysis of efficiency improvements coming from adopting new software and hardware solutions.
    Sentiment Analysis of Political Tweets for Israel using Machine Learning. (arXiv:2204.06515v1 [cs.IR])
    Sentiment Analysis is a vital research topic in the field of Computer Science. With the accelerated development of Information Technology and social networks, a massive amount of data related to comment texts has been generated on web applications or social media platforms like Twitter. Due to this, people have actively started proliferating general information and the information related to political opinions, which becomes an important reason for analyzing public reactions. Most researchers have used social media specifics or contents to analyze and predict public opinion concerning political events. This research proposes an analytical study using Israeli political Twitter data to interpret public opinion towards the Palestinian-Israeli conflict. The attitudes of ethnic groups and opinion leaders in the form of tweets are analyzed using Machine Learning algorithms like Support Vector Classifier (SVC), Decision Tree (DT), and Naive Bayes (NB). Finally, a comparative analysis is done based on experimental results from different models.
    Deep Probabilistic Time Series Forecasting using Augmented Recurrent Input for Dynamic Systems. (arXiv:2106.05848v2 [cs.LG] UPDATED)
    The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment; and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
    COIL: Constrained Optimization in Learned Latent Space -- Learning Representations for Valid Solutions. (arXiv:2202.02163v3 [cs.NE] UPDATED)
    Constrained optimization problems can be difficult because their search spaces have properties not conducive to search, e.g., multimodality, discontinuities, or deception. To address such difficulties, considerable research has been performed on creating novel evolutionary algorithms or specialized genetic operators. However, if the representation that defined the search space could be altered such that it only permitted valid solutions that satisfied the constraints, the task of finding the optimal would be made more feasible without any need for specialized optimization algorithms. We propose Constrained Optimization in Latent Space (COIL), which uses a VAE to generate a learned latent representation from a dataset comprising samples from the valid region of the search space according to a constraint, thus enabling the optimizer to find the objective in the new space defined by the learned representation. Preliminary experiments show promise: compared to an identical GA using a standard representation that cannot meet the constraints or find fit solutions, COIL with its learned latent representation can perfectly satisfy different types of constraints while finding high-fitness solutions.
    Rethinking Reconstruction Autoencoder-Based Out-of-Distribution Detection. (arXiv:2203.02194v2 [cs.CV] UPDATED)
    In some scenarios, classifier requires detecting out-of-distribution samples far from its training data. With desirable characteristics, reconstruction autoencoder-based methods deal with this problem by using input reconstruction error as a metric of novelty vs. normality. We formulate the essence of such approach as a quadruplet domain translation with an intrinsic bias to only query for a proxy of conditional data uncertainty. Accordingly, an improvement direction is formalized as maximumly compressing the autoencoder's latent space while ensuring its reconstructive power for acting as a described domain translator. From it, strategies are introduced including semantic reconstruction, data certainty decomposition and normalized L2 distance to substantially improve original methods, which together establish state-of-the-art performance on various benchmarks, e.g., the FPR@95%TPR of CIFAR-100 vs. TinyImagenet-crop on Wide-ResNet is 0.2%. Importantly, our method works without any additional data, hard-to-implement structure, time-consuming pipeline, and even harming the classification accuracy of known classes.
    Reinforced MOOCs Concept Recommendation in Heterogeneous Information Networks. (arXiv:2203.11011v2 [cs.IR] UPDATED)
    Massive open online courses (MOOCs), which provide a large-scale interactive participation and open access via the web, are becoming a modish way for online and distance education. To help users have a better study experience, many MOOC platforms have provided the services of recommending courses to users. However, we argue that directly recommending a course to users will ignore the expertise levels of different users. To fill this gap, this paper studies the problem of concept recommendation in a more fine-grained view. We propose a novel Heterogeneous Information Networks based Concept Recommender with Reinforcement Learning (HinCRec-RL) incorporated for concept recommendation in MOOCs. Specifically, we first formulate the concept recommendation in MOOCs as a reinforcement learning problem to better model the dynamic interaction among users and knowledge concepts. In addition, to mitigate the data sparsity issue which also exists in many other recommendation tasks, we consider a heterogeneous information network (HIN) among users, courses, videos and concepts, to better learn the semantic representation of users. In particular, we use the meta-paths on HIN to guide the propagation of users' preferences and propose a heterogeneous graph attention network to represent the meta-paths. To validate the effectiveness of our proposed approach, we conduct comprehensive experiments on a real-world dataset from XuetangX, a popular MOOC platform from China. The promising results show that our proposed approach can outperform other baselines.
    A Unified Cascaded Encoder ASR Model for Dynamic Model Sizes. (arXiv:2204.06164v1 [eess.AS])
    In this paper, we propose a dynamic cascaded encoder Automatic Speech Recognition (ASR) model, which unifies models for different deployment scenarios. Moreover, the model can significantly reduce model size and power consumption without loss of quality. Namely, with the dynamic cascaded encoder model, we explore three techniques to maximally boost the performance of each model size: 1) Use separate decoders for each sub-model while sharing the encoders; 2) Use funnel-pooling to improve the encoder efficiency; 3) Balance the size of causal and non-causal encoders to improve quality and fit deployment constraints. Overall, the proposed large-medium model has 30% smaller size and reduces power consumption by 33%, compared to the baseline cascaded encoder model. The triple-size model that unifies the large, medium, and small models achieves 37% total size reduction with minimal quality loss, while substantially reducing the engineering efforts of having separate models.
    Adaptive Height Optimisation for Cellular-Connected UAVs using Reinforcement Learning. (arXiv:2007.13695v3 [eess.SP] UPDATED)
    Providing reliable connectivity to cellular-connected UAV can be very challenging; their performance highly depends on the nature of the surrounding environment, such as density and heights of the ground BSs. On the other hand, tall buildings might block undesired interference signals from ground BSs, thereby improving the connectivity between the UAVs and their serving BSs. To address the connectivity of UAVs in such environments, this paper proposes a RL algorithm to dynamically optimise the height of a UAV as it moves through the environment, with the goal of increasing the throughput or spectrum efficiency that it experiences. The proposed solution is evaluated in two settings: using a series of generated environments where we vary the number of BS and building densities, and in a scenario using real-world data obtained from an experiment in Dublin, Ireland. Results show that our proposed RL-based solution improves UAVs QoS by 6% to 41%, depending on the scenario. We also conclude that, when flying at heights higher than the buildings, building density variation has no impact on UAV QoS. On the other hand, BS density can negatively impact UAV QoS, with higher numbers of BSs generating more interference and deteriorating UAV performance.
    QU-net++: Image Quality Detection Framework for Segmentation of Medical 3D Image Stacks. (arXiv:2110.14181v4 [eess.IV] UPDATED)
    Automated segmentation of pathological regions of interest aids medical image diagnostics and follow-up care. However, accurate pathological segmentations require high quality of annotated data that can be both cost and time intensive to generate. In this work, we propose an automated two-step method that detects a minimal image subset required to train segmentation models by evaluating the quality of medical images from 3D image stacks using a U-net++ model. These images that represent a lack of quality training can then be annotated and used to fully train a U-net-based segmentation model. The proposed QU-net++ model detects this lack of quality training based on the disagreement in segmentations produced from the final two output layers. The proposed model isolates around 10% of the slices per 3D image stack and can scale across imaging modalities to segment cysts in OCT images and ground glass opacity (GGO) in lung CT images with Dice scores in the range 0.56-0.72. Thus, the proposed method can be applied for cost effective multi-modal pathology segmentation tasks.
    Why KDAC? A general activation function for knowledge discovery. (arXiv:2111.13858v3 [cs.LG] UPDATED)
    Deep learning oriented named entity recognition (DNER) has gradually become the paradigm of knowledge discovery, which greatly promotes domain intelligence. However, the current activation function of DNER fails to treat gradient vanishing, no negative output or non-differentiable existence, which may impede knowledge exploration caused by the omission and incomplete representation of latent semantics. To break through the dilemma, we present a novel activation function termed KDAC. Detailly, KDAC is an aggregation function with multiple conversion modes. The backbone of the activation region is the interaction between exponent and linearity, and the both ends extend through adaptive linear divergence, which surmounts the obstacle of gradient vanishing and no negative output. Crucially, the non-differentiable points are alerted and eliminated by an approximate smoothing algorithm. KDAC has a series of brilliant properties, including nonlinear, stable near-linear transformation and derivative, as well as dynamic style, etc. We perform experiments based on BERT-BiLSTM-CNN-CRF model on six benchmark datasets containing different domain knowledge, such as Weibo, Clinical, E-commerce, Resume, HAZOP and People's daily. The evaluation results show that KDAC is advanced and effective, and can provide more generalized activation to stimulate the performance of DNER. We hope that KDAC can be exploited as a promising activation function to devote itself to the construction of knowledge.
    Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability. (arXiv:2204.06425v1 [cs.SE])
    Machine learning models have been widely developed, released, and adopted in numerous applications. Meanwhile, the documentation practice for machine learning models often falls short of established practices for traditional software components, which impedes model accountability, inadvertently abets inappropriate or misuse of models, and may trigger negative social impact. Recently, model cards, a template for documenting machine learning models, have attracted notable attention, but their impact on the practice of model documentation is unclear. In this work, we examine publicly available model cards and other similar documentation. Our analysis reveals a substantial gap between the suggestions made in the original model card work and the content in actual documentation. Motivated by this observation and literature on fields such as software documentation, interaction design, and traceability, we further propose a set of design guidelines that aim to support the documentation practice for machine learning models including (1) the collocation of documentation environment with the coding environment, (2) nudging the consideration of model card sections during model development, and (3) documentation derived from and traced to the source. We designed a prototype tool named DocML following those guidelines to support model development in computational notebooks. A lab study reveals the benefit of our tool to shift the behavior of data scientists towards documentation quality and accountability.
    Modelling Evolutionary and Stationary User Preferences for Temporal Sets Prediction. (arXiv:2204.05490v2 [cs.LG] UPDATED)
    Given a sequence of sets, where each set is associated with a timestamp and contains an arbitrary number of elements, the task of temporal sets prediction aims to predict the elements in the subsequent set. Previous studies for temporal sets prediction mainly capture each user's evolutionary preference by learning from his/her own sequence. Although insightful, we argue that: 1) the collaborative signals latent in different users' sequences are essential but have not been exploited; 2) users also tend to show stationary preferences while existing methods fail to consider. To this end, we propose an integrated learning framework to model both the evolutionary and the stationary preferences of users for temporal sets prediction, which first constructs a universal sequence by chronologically arranging all the user-set interactions, and then learns on each user-set interaction. In particular, for each user-set interaction, we first design an evolutionary user preference modelling component to track the user's time-evolving preference and exploit the latent collaborative signals among different users. This component maintains a memory bank to store memories of the related user and elements, and continuously updates their memories based on the currently encoded messages and the past memories. Then, we devise a stationary user preference modelling module to discover each user's personalized characteristics according to the historical sequence, which adaptively aggregates the previously interacted elements from dual perspectives with the guidance of the user's and elements' embeddings. Finally, we develop a set-batch algorithm to improve the model efficiency, which can create time-consistent batches in advance and achieve 3.5x training speedups on average. Experiments on real-world datasets demonstrate the effectiveness and good interpretability of our approach.
    AHP: Learning to Negative Sample for Hyperedge Prediction. (arXiv:2204.06353v1 [cs.LG])
    Hypergraphs (i.e., sets of hyperedges) naturally represent group relations (e.g., researchers co-authoring a paper and ingredients used together in a recipe), each of which corresponds to a hyperedge (i.e., a subset of nodes). Predicting future or missing hyperedges bears significant implication for many applications (e.g., collaboration and recipe recommendation). What makes hyperedge prediction particularly challenging is the vast number of non-hyperedge subsets, which grows exponentially with the number of nodes. Since it is prohibitive to use all of them as negative examples for model training, it is inevitable to sample a very small portion of them, and to this end, heuristic sampling schemes have been employed. However, trained models suffer from poor generalization capability for examples of different natures. In this paper, we propose AHP, an adversarial training-based hyperedge-prediction method. It learns to sample negative examples without relying on any heuristic schemes. Using six real hypergraphs, we show that AHP generalizes better to negative examples of various natures. It yields up to 28.2% higher AUROC than best existing methods and often even outperforms its variants with sampling schemes tailored to test sets.
    Label Augmentation with Reinforced Labeling for Weak Supervision. (arXiv:2204.06436v1 [cs.LG])
    Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs) instead of hand-labeling each data point. However, the existing approach fails to fully exploit the domain knowledge encoded into LFs, especially when the LFs' coverage is low. This is due to the common data programming pipeline that neglects to utilize data features during the generative process. This paper proposes a new approach called reinforced labeling (RL). Given an unlabeled dataset and a set of LFs, RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples. Thus, RL can lead to higher labeling coverage for training an end classifier. The experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains. The new approach produces significant performance improvement, leading up to +21 points in accuracy and +61 points in F1 scores compared to the state-of-the-art data programming approach.
    Clinical trial site matching with improved diversity using fair policy learning. (arXiv:2204.06501v1 [cs.LG])
    The ongoing pandemic has highlighted the importance of reliable and efficient clinical trials in healthcare. Trial sites, where the trials are conducted, are chosen mainly based on feasibility in terms of medical expertise and access to a large group of patients. More recently, the issue of diversity and inclusion in clinical trials is gaining importance. Different patient groups may experience the effects of a medical drug/ treatment differently and hence need to be included in the clinical trials. These groups could be based on ethnicity, co-morbidities, age, or economic factors. Thus, designing a method for trial site selection that accounts for both feasibility and diversity is a crucial and urgent goal. In this paper, we formulate this problem as a ranking problem with fairness constraints. Using principles of fairness in machine learning, we learn a model that maps a clinical trial description to a ranked list of potential trial sites. Unlike existing fairness frameworks, the group membership of each trial site is non-binary: each trial site may have access to patients from multiple groups. We propose fairness criteria based on demographic parity to address such a multi-group membership scenario. We test our method on 480 real-world clinical trials and show that our model results in a list of potential trial sites that provides access to a diverse set of patients while also ensuing a high number of enrolled patients.
    Distilling the Knowledge of Romanian BERTs Using Multiple Teachers. (arXiv:2112.12650v3 [cs.CL] UPDATED)
    Running large-scale pre-trained language models in computationally constrained environments remains a challenging problem yet to be addressed, while transfer learning from these models has become prevalent in Natural Language Processing tasks. Several solutions, including knowledge distillation, network quantization, or network pruning have been previously proposed; however, these approaches focus mostly on the English language, thus widening the gap when considering low-resource languages. In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base, and DistilMulti-BERT-base-ro. The first two models resulted from the individual distillation of knowledge from two base versions of Romanian BERTs available in literature, while the last one was obtained by distilling their ensemble. To our knowledge, this is the first attempt to create publicly available Romanian distilled BERT models, which were thoroughly evaluated on five tasks: part-of-speech tagging, named entity recognition, sentiment analysis, semantic textual similarity, and dialect identification. Our experimental results argue that the three distilled models offer performance comparable to their teachers, while being twice as fast on a GPU and ~35% smaller. In addition, we further test the similarity between the predictions of our students versus their teachers by measuring their label and probability loyalty, together with regression loyalty - a new metric introduced in this work.
    Estimation of stellar atmospheric parameters from LAMOST DR8 low-resolution spectra with 20$\leq$SNR$<$30. (arXiv:2204.06301v1 [astro-ph.GA])
    The accuracy of the estimated stellar atmospheric parameter decreases evidently with the decreasing of spectral signal-to-noise ratio (SNR) and there are a huge amount of this kind observations, especially in case of SNR$<$30. Therefore, it is helpful to improve the parameter estimation performance for these spectra and this work studied the ($T_\texttt{eff}, \log~g$, [Fe/H]) estimation problem for LAMOST DR8 low-resolution spectra with 20$\leq$SNR$<$30. We proposed a data-driven method based on machine learning techniques. Firstly, this scheme detected stellar atmospheric parameter-sensitive features from spectra by the Least Absolute Shrinkage and Selection Operator (LASSO), rejected ineffective data components and irrelevant data. Secondly, a Multi-layer Perceptron (MLP) method was used to estimate stellar atmospheric parameters from the LASSO features. Finally, the performance of the LASSO-MLP was evaluated by computing and analyzing the consistency between its estimation and the reference from the APOGEE (Apache Point Observatory Galactic Evolution Experiment) high-resolution spectra. Experiments show that the Mean Absolute Errors (MAE) of $T_\texttt{eff}, \log~g$, [Fe/H] are reduced from the LASP (137.6 K, 0.195 dex, 0.091 dex) to LASSO-MLP (84.32 K, 0.137 dex, 0.063 dex), which indicate evident improvements on stellar atmospheric parameter estimation. In addition, this work estimated the stellar atmospheric parameters for 1,162,760 low-resolution spectra with 20$\leq$SNR$<$30 from LAMOST DR8 using LASSO-MLP, and released the estimation catalog, learned model, experimental code, trained model, training data and test data for scientific exploration and algorithm study.
    Autonomy and Perception for Space Mining. (arXiv:2109.12109v3 [cs.RO] UPDATED)
    Future Moon bases will likely be constructed using resources mined from the surface of the Moon. The difficulty of maintaining a human workforce on the Moon and communications lag with Earth means that mining will need to be conducted using collaborative robots with a high degree of autonomy. In this paper, we describe our solution for Phase 2 of the NASA Space Robotics Challenge, which provided a simulated lunar environment in which teams were tasked to develop software systems to achieve autonomous collaborative robots for mining on the Moon. Our 3rd place and innovation award winning solution shows how machine learning-enabled vision could alleviate major challenges posed by the lunar environment towards autonomous space mining, chiefly the lack of satellite positioning systems, hazardous terrain, and delicate robot interactions. A robust multi-robot coordinator was also developed to achieve long-term operation and effective collaboration between robots.
    A Statistical Learning View of Simple Kriging. (arXiv:2202.07365v3 [stat.ML] UPDATED)
    In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
    Generalization Error Bounds for Multiclass Sparse Linear Classifiers. (arXiv:2204.06264v1 [math.ST])
    We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
    Disentangling Autoencoders (DAE). (arXiv:2202.09926v2 [cs.LG] UPDATED)
    Noting the importance of factorizing (or disentangling) the latent space, we propose a novel, non-probabilistic disentangling framework for autoencoders, based on the principles of symmetry transformations in group-theory. To the best of our knowledge, this is the first deterministic model that is aiming to achieve disentanglement based on autoencoders without regularizers. The proposed model is compared to seven state-of-the-art generative models based on autoencoders and evaluated based on five supervised disentanglement metrics. The experimental results show that the proposed model can have better disentanglement when variances of each features are different. We believe that this model leads to a new field for disentanglement learning based on autoencoders without regularizers.
    Reinforcement Learning on Graph: A Survey. (arXiv:2204.06127v1 [cs.LG])
    Graph mining tasks arise from many different application domains, ranging from social networks, transportation, E-commerce, etc., which have been receiving great attention from the theoretical and algorithm design communities in recent years, and there has been some pioneering work using the hotly researched reinforcement learning (RL) techniques to address graph data mining tasks. However, these graph mining algorithms and RL models are dispersed in different research areas, which makes it hard to compare different algorithms with each other. In this survey, we provide a comprehensive overview of RL models and graph mining and generalize these algorithms to Graph Reinforcement Learning (GRL) as a unified formulation. We further discuss the applications of GRL methods across various domains and summarize the method description, open-source codes, and benchmark datasets of GRL methods. Finally, we propose possible important directions and challenges to be solved in the future. This is the latest work on a comprehensive survey of GRL literature, and this work provides a global view for researchers as well as a learning resource for researchers outside the domain. In addition, we create an online open-source for both interested researchers who want to enter this rapidly developing domain and experts who would like to compare GRL methods.
    Production federated keyword spotting via distillation, filtering, and joint federated-centralized training. (arXiv:2204.06322v1 [eess.AS])
    We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones. To compensate for data domains that are missing from on-device training caches, we employed joint federated-centralized training. And to learn in the absence of curated labels on-device, we formulated a confidence filtering strategy based on user-feedback signals for federated distillation. These techniques created models that significantly improved quality metrics in offline evaluations and user-experience metrics in live A/B experiments.
    Modeling and Analysis of Intermittent Federated Learning Over Cellular-Connected UAV Networks. (arXiv:2110.07077v3 [cs.LG] UPDATED)
    Federated learning (FL) is a promising distributed learning technique particularly suitable for wireless learning scenarios since it can accomplish a learning task without raw data transportation so as to preserve data privacy and lower network resource consumption. However, current works on FL over wireless networks do not profoundly study the fundamental performance of FL over wireless networks that suffers from communication outage due to channel impairment and network interference. To accurately exploit the performance of FL over wireless networks, this paper proposes a novel intermittent FL model over a cellular-connected unmanned aerial vehicle (UAV) network, which characterizes communication outage from UAV (clients) to their server and data heterogeneity among the datasets at UAVs. We propose an analytically tractable framework to derive the uplink outage probability and use it to devise a simulation-based approach so as to evaluate the performance of the proposed intermittent FL model. Our findings reveal how the intermittent FL model is impacted by uplink communication outage and UAV deployment. Extensive numerical simulations are provided to show the consistency between the simulated and analytical performances of the proposed intermittent FL model.
    FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning. (arXiv:2204.05562v2 [cs.LG] UPDATED)
    The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at https://github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.
    Baseline Computation for Attribution Methods Based on Interpolated Inputs. (arXiv:2204.06120v1 [cs.CV])
    We discuss a way to find a well behaved baseline for attribution methods that work by feeding a neural network with a sequence of interpolated inputs between two given inputs. Then, we test it with our novel Riemann-Stieltjes Integrated Gradient-weighted Class Activation Mapping (RSI-Grad-CAM) attribution method.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v1 [stat.ML])
    Many real-world systems are described not only by data from a single source but via multiple data views. For example, in genomic medicine, a patient can be described by data from different molecular layers. This raises the need for multi-view models that are able to disentangle variation within and across data views in an interpretable manner. Latent variable models with structured sparsity are a commonly used tool to address this modeling task but interpretability is cumbersome since it requires a direct inspection and interpretation of each factor via a specialized domain expert. Here, we propose MuVI, a novel approach for domain-informed multi-view latent variable models, facilitating the analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) is able to integrate noisy domain expertise in form of feature sets, (ii) is robust to noise in the encoded domain knowledge, (iii) results in identifiable factors and (iv) is able to infer interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    A quantum generative model for multi-dimensional time series using Hamiltonian learning. (arXiv:2204.06150v1 [quant-ph])
    Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.
    Organization of a Latent Space structure in VAE/GAN trained by navigation data. (arXiv:2102.01852v3 [cs.LG] UPDATED)
    We present a novel artificial cognitive mapping system using generative deep neural networks, called variational autoencoder/generative adversarial network (VAE/GAN), which can map input images to latent vectors and generate temporal sequences internally. The results show that the distance of the predicted image is reflected in the distance of the corresponding latent vector after training. This indicates that the latent space is self-organized to reflect the proximity structure of the dataset and may provide a mechanism through which many aspects of cognition are spatially represented. The present study allows the network to internally generate temporal sequences that are analogous to the hippocampal replay/pre-play ability, where VAE produces only near-accurate replays of past experiences, but by introducing GANs, the generated sequences are coupled with instability and novelty.
    FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations. (arXiv:2204.06508v1 [cs.CL])
    Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications. Recent works have shown promising improvements in factuality error identification using text or dependency arc entailments; however, they do not consider the entire semantic graph simultaneously. To this end, we propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MR), which are more suitable for factuality evaluation. MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity. FactGraph encodes such graphs using a graph encoder augmented with structure-aware adapters to capture interactions among the concepts based on the graph connectivity, along with text representations using an adapter-based text encoder. Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%. Furthermore, FactGraph improves performance on identifying content verifiability errors and better captures subsentence-level factual inconsistencies.
    Challenges and Opportunities of Edge AI for Next-Generation Implantable BMIs. (arXiv:2204.02362v2 [cs.AI] UPDATED)
    Neuroscience and neurotechnology are currently being revolutionized by artificial intelligence (AI) and machine learning. AI is widely used to study and interpret neural signals (analytical applications), assist people with disabilities (prosthetic applications), and treat underlying neurological symptoms (therapeutic applications). In this brief, we will review the emerging opportunities of on-chip AI for the next-generation implantable brain-machine interfaces (BMIs), with a focus on state-of-the-art prosthetic BMIs. Major technological challenges for the effectiveness of AI models will be discussed. Finally, we will present algorithmic and IC design solutions to enable a new generation of AI-enhanced and high-channel-count BMIs.
    On the dynamics of credit history and social interaction features, and their impact on creditworthiness assessment performance. (arXiv:2204.06122v1 [cs.SI])
    For more than a half-century, credit risk management has used credit scoring models in each of its well-defined stages to manage credit risk. Application scoring is used to decide whether to grant a credit or not, while behavioral scoring is used mainly for portfolio management and to take preventive actions in case of default signals. In both cases, network data has recently been shown to be valuable to increase the predictive power of these models, especially when the borrower's historical data is scarce or not available. This study aims to understand the creditworthiness assessment performance dynamics and how it is influenced by the credit history, repayment behavior, and social network features. To accomplish this, we introduced a machine learning classification framework to analyze 97.000 individuals and companies from the moment they obtained their first loan to 12 months afterward. Our novel and massive dataset allow us to characterize each borrower according to their credit behavior, and social and economic relationships. Our research shows that borrowers' history increases performance at a decreasing rate during the first six months and then stabilizes. The most notable effect on perfomance of social networks features occurs at loan application; in personal scoring, this effect prevails a few months, while in business scoring adds value throughout the study period. These findings are of great value to improve credit risk management and optimize the use of traditional information and alternative data sources.
    Receptive Field Analysis of Temporal Convolutional Networks for Monaural Speech Dereverberation. (arXiv:2204.06439v1 [cs.SD])
    Speech dereverberation is often an important requirement in robust speech processing tasks. Supervised deep learning (DL) models give state-of-the-art performance for single-channel speech dereverberation. Temporal convolutional networks (TCNs) are commonly used for sequence modelling in speech enhancement tasks. A feature of TCNs is that they have a receptive field (RF) dependant on the specific model configuration which determines the number of input frames that can be observed to produce an individual output frame. It has been shown that TCNs are capable of performing dereverberation of simulated speech data, however a thorough analysis, especially with focus on the RF is yet lacking in the literature. This paper analyses dereverberation performance depending on the model size and the RF of TCNs. Experiments using the WHAMR corpus which is extended to include room impulse responses (RIRs) with larger T60 values demonstrate that a larger RF can have significant improvement in performance when training smaller TCN models. It is also demonstrated that TCNs benefit from a wider RF when dereverberating RIRs with larger RT60 values.
    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information. (arXiv:2103.07084v2 [stat.ML] UPDATED)
    Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
    A pipeline and comparative study of 12 machine learning models for text classification. (arXiv:2204.06518v1 [cs.IR])
    Text-based communication is highly favoured as a communication method, especially in business environments. As a result, it is often abused by sending malicious messages, e.g., spam emails, to deceive users into relaying personal information, including online accounts credentials or banking details. For this reason, many machine learning methods for text classification have been proposed and incorporated into the services of most email providers. However, optimising text classification algorithms and finding the right tradeoff on their aggressiveness is still a major research problem. We present an updated survey of 12 machine learning text classifiers applied to a public spam corpus. A new pipeline is proposed to optimise hyperparameter selection and improve the models' performance by applying specific methods (based on natural language processing) in the preprocessing stage. Our study aims to provide a new methodology to investigate and optimise the effect of different feature sizes and hyperparameters in machine learning classifiers that are widely used in text classification problems. The classifiers are tested and evaluated on different metrics including F-score (accuracy), precision, recall, and run time. By analysing all these aspects, we show how the proposed pipeline can be used to achieve a good accuracy towards spam filtering on the Enron dataset, a widely used public email corpus. Statistical tests and explainability techniques are applied to provide a robust analysis of the proposed pipeline and interpret the classification outcomes of the 12 machine learning models, also identifying words that drive the classification results. Our analysis shows that it is possible to identify an effective machine learning model to classify the Enron dataset with an F-score of 94%.
    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning. (arXiv:2101.10102v2 [cs.LG] UPDATED)
    To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
    Keys to Accurate Feature Extraction Using Residual Spiking Neural Networks. (arXiv:2111.05955v3 [cs.LG] UPDATED)
    Spiking neural networks (SNNs) have become an interesting alternative to conventional artificial neural networks (ANN) thanks to their temporal processing capabilities and energy efficient implementations in neuromorphic hardware. However the challenges involved in training SNNs have limited their performance in terms of accuracy and thus their applications. Improving learning algorithms and neural architectures for a more accurate feature extraction is therefore one of the current priorities in SNN research. In this paper we present a study on the key components of modern spiking architectures. We empirically compare different techniques in image classification datasets taken from the best performing networks. We design a spiking version of the successful residual network architecture and provide an in-depth study on the possible implementations of spiking residual connections. Our results provide a state of the art guide to SNN design, which allows to make informed choices when trying to build the optimal visual feature extractor. Finally, our network outperforms previous SNN architectures in CIFAR-10 (94.14%) and CIFAR-100 (74.65%) datasets and matches the state of the art in DVS-CIFAR10 (72.98%), with less parameters than the previous state of the art and without the need for ANN-SNN conversion. Code available at https://github.com/VicenteAlex/Spiking_ResNet
    Inspection-L: A Self-Supervised GNN-Based Money Laundering Detection System for Bitcoin. (arXiv:2203.10465v2 [cs.CR] UPDATED)
    Criminals have become increasingly experienced in using cryptocurrencies, such as Bitcoin, for money laundering. The use of cryptocurrencies can hide criminal identities and transfer hundreds of millions of dollars of dirty funds through their criminal digital wallets. However, this is considered a paradox because cryptocurrencies are gold mines for open-source intelligence, allowing law enforcement agencies to have more power in conducting forensic analyses. This paper proposed Inspection-L, a graph neural network (GNN) framework based on self-supervised Deep Graph Infomax (DGI), with supervised learning algorithms, namely Random Forest (RF) to detect illicit transactions for AML. To the best of our knowledge, our proposal is the first of applying self-supervised GNNs to the problem of AML in Bitcoin. The proposed method has been evaluated on the Elliptic dataset and shows that our approach outperforms the baseline in terms of key classification metrics, which demonstrates the potential of self-supervised GNN in cryptocurrency illicit transaction detection.
    Meaningful machine learning models and machine-learned pharmacophores from fragment screening campaigns. (arXiv:2204.06348v1 [q-bio.BM])
    Machine learning (ML) is widely used in drug discovery to train models that predict protein-ligand binding. These models are of great value to medicinal chemists, in particular if they provide case-specific insight into the physical interactions that drive the binding process. In this study we derive ML models from over 50 fragment-screening campaigns to introduce two important elements that we believe are absent in most -- if not all -- ML studies of this type reported to date: First, alongside the observed hits we use to train our models, we incorporate true misses and show that these experimentally validated negative data are of significant importance to the quality of the derived models. Second, we provide a physically interpretable and verifiable representation of what the ML model considers important for successful binding. This representation is derived from a straightforward attribution procedure that explains the prediction in terms of the (inter-)action of chemical environments. Critically, we validate the attribution outcome on a large scale against prior annotations made independently by expert molecular modellers. We find good agreement between the key molecular substructures proposed by the ML model and those assigned manually, even when the model's performance in discriminating hits from misses is far from perfect. By projecting the attribution onto predefined interaction prototypes (pharmacophores), we show that ML allows us to formulate simple rules for what drives fragment binding against a target automatically from screening data.
    Online greedy identification of linear dynamical systems. (arXiv:2204.06375v1 [stat.ML])
    This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.
    Convex-Concave Min-Max Stackelberg Games. (arXiv:2110.05192v4 [cs.GT] UPDATED)
    Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
    Hybrid Neural Network Augmented Physics-based Models for Nonlinear Filtering. (arXiv:2204.06471v1 [cs.LG])
    In this paper we present a hybrid neural network augmented physics-based modeling (APBM) framework for Bayesian nonlinear latent space estimation. The proposed APBM strategy allows for model adaptation when new operation conditions come into play or the physics-based model is insufficient (or incomplete) to properly describe the latent phenomenon. One advantage of the APBMs and our estimation procedure is the capability of maintaining the physical interpretability of estimated states. Furthermore, we propose a constraint filtering approach to control the neural network contributions to the overall model. We also exploit assumed density filtering techniques and cubature integration rules to present a flexible estimation strategy that can easily deal with nonlinear models and high-dimensional latent spaces. Finally, we demonstrate the efficacy of our methodology by leveraging a target tracking scenario with nonlinear and incomplete measurement and acceleration models, respectively.
    Overparameterized Linear Regression under Adversarial Attacks. (arXiv:2204.06274v1 [stat.ML])
    As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from non-adversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the $\ell_1$ and $\ell_2$ norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.
    Safer Autonomous Driving in a Stochastic, Partially-Observable Environment by Hierarchical Contingency Planning. (arXiv:2204.06509v1 [cs.LG])
    When learning to act in a stochastic, partially observable environment, an intelligent agent should be prepared to anticipate a change in its belief of the environment state, and be capable of adapting its actions on-the-fly to changing conditions. As humans, we are able to form contingency plans when learning a task with the explicit aim of being able to correct errors in the initial control, and hence prove useful if ever there is a sudden change in our perception of the environment which requires immediate corrective action. This is especially the case for autonomous vehicles (AVs) navigating real-world situations where safety is paramount, and a strong ability to react to a changing belief about the environment is truly needed. In this paper we explore an end-to-end approach, from training to execution, for learning robust contingency plans and combining them with a hierarchical planner to obtain a robust agent policy in an autonomous navigation task where other vehicles' behaviours are unknown, and the agent's belief about these behaviours is subject to sudden, last-second change. We show that our approach results in robust, safe behaviour in a partially observable, stochastic environment, generalizing well over environment dynamics not seen during training.
    Massive MIMO Beam Management in Sub-6 GHz 5G NR. (arXiv:2204.06064v1 [eess.SP])
    Beam codebooks are a new feature of massive multiple-input multiple-output (M-MIMO) in 5G new radio (NR). Codebooks comprised of beamforming vectors are used to transmit reference signals and obtain limited channel state information (CSI) from receivers via the codeword index. This enables large arrays that cannot otherwise obtain sufficient CSI. The performance, however, is limited by the codebook design. In this paper, we show that machine learning can be used to train site-specific codebooks for initial access. We design a neural network based on an autoencoder architecture that uses a beamspace observation in combination with RF environment characteristics to improve the synchronization signal (SS) burst codebook. We test our algorithm using a flexible dataset of channels generated from QuaDRiGa. The results show that our model outperforms the industry standard (DFT beams) and approaches the optimal performance (perfect CSI and singular value decomposition (SVD)-based beamforming), using only a few bits of feedback.
    Data-heterogeneity-aware Mixing for Decentralized Learning. (arXiv:2204.06477v1 [cs.LG])
    Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. We further prove that the metric controls the convergence rate, particularly in settings where the heterogeneity across nodes dominates the stochasticity between updates for a given node. Motivated by our analysis, we propose an approach that periodically and efficiently optimizes the metric using standard convex constrained optimization and sketching techniques. Through comprehensive experiments on standard computer vision and NLP benchmarks, we show that our approach leads to improvement in test performance for a wide range of tasks.
    Approximation of Lipschitz Functions using Deep Spline Neural Networks. (arXiv:2204.06233v1 [cs.LG])
    Lipschitz-constrained neural networks have many applications in machine learning. Since designing and training expressive Lipschitz-constrained networks is very challenging, there is a need for improved methods and a better theoretical understanding. Unfortunately, it turns out that ReLU networks have provable disadvantages in this setting. Hence, we propose to use learnable spline activation functions with at least 3 linear regions instead. We prove that this choice is optimal among all component-wise $1$-Lipschitz activation functions in the sense that no other weight constrained architecture can approximate a larger class of functions. Additionally, this choice is at least as expressive as the recently introduced non component-wise Groupsort activation function for spectral-norm-constrained weights. Previously published numerical results support our theoretical findings.
    Out-of-distribution Detection with Deep Nearest Neighbors. (arXiv:2204.06507v1 [cs.LG])
    Out-of-distribution (OOD) detection is a critical task for deploying machine learning models in the open world. Distance-based methods have demonstrated promise, where testing samples are detected as OOD if they are relatively far away from in-distribution (ID) data. However, prior methods impose a strong distributional assumption of the underlying feature space, which may not always hold. In this paper, we explore the efficacy of non-parametric nearest-neighbor distance for OOD detection, which has been largely overlooked in the literature. Unlike prior works, our method does not impose any distributional assumption, hence providing stronger flexibility and generality. We demonstrate the effectiveness of nearest-neighbor-based OOD detection on several benchmarks and establish superior performance. Under the same model trained on ImageNet-1k, our method substantially reduces the false positive rate (FPR@TPR95) by 24.77% compared to a strong baseline SSD+, which uses a parametric approach Mahalanobis distance in detection.
    Sigma-Delta and Distributed Noise-Shaping Quantization Methods for Random Fourier Features. (arXiv:2106.02614v2 [cs.LG] UPDATED)
    We propose the use of low bit-depth Sigma-Delta and distributed noise-shaping methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context.
    TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. (arXiv:2201.07284v5 [cs.LG] UPDATED)
    Efficient anomaly detection and diagnosis in multivariate time-series data is of great importance for modern industrial applications. However, building a system that is able to quickly and accurately pinpoint anomalous observations is a challenging problem. This is due to the lack of anomaly labels, high data volatility and the demands of ultra-low inference times in modern applications. Despite the recent developments of deep learning approaches for anomaly detection, only a few of them can address all of these challenges. In this paper, we propose TranAD, a deep transformer network based anomaly detection and diagnosis model which uses attention-based sequence encoders to swiftly perform inference with the knowledge of the broader temporal trends in the data. TranAD uses focus score-based self-conditioning to enable robust multi-modal feature extraction and adversarial training to gain stability. Additionally, model-agnostic meta learning (MAML) allows us to train the model using limited data. Extensive empirical studies on six publicly available datasets demonstrate that TranAD can outperform state-of-the-art baseline methods in detection and diagnosis performance with data and time-efficient training. Specifically, TranAD increases F1 scores by up to 17%, reducing training times by up to 99% compared to the baselines.
    ADASYN-Random Forest Based Intrusion Detection Model. (arXiv:2105.04301v5 [cs.CR] UPDATED)
    Intrusion detection has been a key topic in the field of cyber security, and the common network threats nowadays have the characteristics of varieties and variation. Considering the serious imbalance of intrusion detection datasets will result in low classification performance on attack behaviors of small sample size and difficulty to detect network attacks accurately and efficiently, using Adaptive Synthetic Sampling (ADASYN) method to balance datasets was proposed in this paper. In addition, Random Forest algorithm was used to train intrusion detection classifiers. Through the comparative experiment of Intrusion detection on CICIDS 2017 dataset, it is found that ADASYN with Random Forest performs better. Based on the experimental results, the improvement of precision, recall, F1 scores and AUC values after ADASYN is then analyzed. Experiments show that the proposed method can be applied to intrusion detection with large data, and can effectively improve the classification accuracy of network attack behaviors. Compared with traditional machine learning models, it has better performance, generalization ability and robustness.
    Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency. (arXiv:2112.06054v3 [cs.LG] UPDATED)
    Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications. Many studies improve sample efficiency by extending adversarial imitation to be off-policy regardless of the fact that these off-policy extensions could either change the original objective or involve complicated optimization. We revisit the foundation of adversarial imitation and propose an off-policy sample efficient approach that requires no adversarial training or min-max optimization. Our formulation capitalizes on two key insights: (1) the similarity between the Bellman equation and the stationary state-action distribution equation allows us to derive a novel temporal difference (TD) learning approach; and (2) the use of a deterministic policy simplifies the TD learning. Combined, these insights yield a practical algorithm, Deterministic and Discriminative Imitation (D2-Imitation), which operates by first partitioning samples into two replay buffers and then learning a deterministic policy via off-policy reinforcement learning. Our empirical results show that D2-Imitation is effective in achieving good sample efficiency, outperforming several off-policy extension approaches of adversarial imitation on many control tasks.
    Enabling Synthetic Data adoption in regulated domains. (arXiv:2204.06297v1 [cs.LG])
    The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
    Learning Self-Modulating Attention in Continuous Time Space with Applications to Sequential Recommendation. (arXiv:2204.06517v1 [cs.IR])
    User interests are usually dynamic in the real world, which poses both theoretical and practical challenges for learning accurate preferences from rich behavior data. Among existing user behavior modeling solutions, attention networks are widely adopted for its effectiveness and relative simplicity. Despite being extensively studied, existing attentions still suffer from two limitations: i) conventional attentions mainly take into account the spatial correlation between user behaviors, regardless the distance between those behaviors in the continuous time space; and ii) these attentions mostly provide a dense and undistinguished distribution over all past behaviors then attentively encode them into the output latent representations. This is however not suitable in practical scenarios where a user's future actions are relevant to a small subset of her/his historical behaviors. In this paper, we propose a novel attention network, named self-modulating attention, that models the complex and non-linearly evolving dynamic user preferences. We empirically demonstrate the effectiveness of our method on top-N sequential recommendation tasks, and the results on three large-scale real-world datasets show that our model can achieve state-of-the-art performance.
    A Review of Machine Learning Methods Applied to Structural Dynamics and Vibroacoustic. (arXiv:2204.06362v1 [cs.LG])
    The use of Machine Learning (ML) has rapidly spread across several fields, having encountered many applications in Structural Dynamics and Vibroacoustic (SD\&V). The increasing capabilities of ML to unveil insights from data, driven by unprecedented data availability, algorithms advances and computational power, enhance decision making, uncertainty handling, patterns recognition and real-time assessments. Three main applications in SD\&V have taken advantage of these benefits. In Structural Health Monitoring, ML detection and prognosis lead to safe operation and optimized maintenance schedules. System identification and control design are leveraged by ML techniques in Active Noise Control and Active Vibration Control. Finally, the so-called ML-based surrogate models provide fast alternatives to costly simulations, enabling robust and optimized product design. Despite the many works in the area, they have not been reviewed and analyzed. Therefore, to keep track and understand this ongoing integration of fields, this paper presents a survey of ML applications in SD\&V analyses, shedding light on the current state of implementation and emerging opportunities. The main methodologies, advantages, limitations, and recommendations based on scientific knowledge were identified for each of the three applications. Moreover, the paper considers the role of Digital Twins and Physics Guided ML to overcome current challenges and power future research progress. As a result, the survey provides a broad overview of the present landscape of ML applied in SD\&V and guides the reader to an advanced understanding of progress and prospects in the field.
    Local and global topological complexity measures OF ReLU neural network functions. (arXiv:2204.06062v1 [math.AT])
    We apply a generalized piecewise-linear (PL) version of Morse theory due to Grunert-Kuhnel-Rote to define and study new local and global notions of topological complexity for fully-connected feedforward ReLU neural network functions, F: R^n -> R. Along the way, we show how to construct, for each such F, a canonical polytopal complex K(F) and a deformation retract of the domain onto K(F), yielding a convenient compact model for performing calculations. We also give a combinatorial description of local complexity for depth 2 networks, and a construction showing that local complexity can be arbitrarily high.
    Do We Need Anisotropic Graph Neural Networks?. (arXiv:2104.01481v4 [cs.LG] UPDATED)
    Common wisdom in the graph neural network (GNN) community dictates that anisotropic models -- in which messages sent between nodes are a function of both the source and target node -- are required to achieve state-of-the-art performance. Benchmarks to date have demonstrated that these models perform better than comparable isotropic models -- where messages are a function of the source node only. In this work we provide empirical evidence challenging this narrative: we propose an isotropic GNN, which we call Efficient Graph Convolution (EGC), that consistently outperforms comparable anisotropic models, including the popular GAT or PNA architectures by using spatially-varying adaptive filters. In addition to raising important questions for the GNN community, our work has significant real-world implications for efficiency. EGC achieves higher model accuracy, with lower memory consumption and latency, along with characteristics suited to accelerator implementation, while being a drop-in replacement for existing architectures. As an isotropic model, it requires memory proportional to the number of vertices in the graph ($\mathcal{O}(V)$); in contrast, anisotropic models require memory proportional to the number of edges ($\mathcal{O}(E)$). We demonstrate that EGC outperforms existing approaches across 6 large and diverse benchmark datasets, and conclude by discussing questions that our work raise for the community going forward. Code and pretrained models for our experiments are provided at https://github.com/shyam196/egc.
    Slope stability predictions on spatially variable random fields using machine learning surrogate models. (arXiv:2204.06097v1 [cs.LG])
    Random field Monte Carlo (MC) reliability analysis is a robust stochastic method to determine the probability of failure. This method, however, requires a large number of numerical simulations demanding high computational costs. This paper explores the efficiency of different machine learning (ML) algorithms used as surrogate models trained on a limited number of random field slope stability simulations in predicting the results of large datasets. The MC data in this paper require only the examination of failure or non-failure, circumventing the time-consuming calculation of factors of safety. An extensive dataset is generated, consisting of 120,000 finite difference MC slope stability simulations incorporating different levels of soil heterogeneity and anisotropy. The Bagging Ensemble, Random Forest and Support Vector classifiers are found to be the superior models for this problem amongst 9 different models and ensemble classifiers. Trained only on 0.47% of data (500 samples), the ML model can classify the entire 120,000 samples with an accuracy of %85 and AUC score of %91. The performance of ML methods in classifying the random field slope stability results generally reduces with higher anisotropy and heterogeneity of soil. The ML assisted MC reliability analysis proves a robust stochastic method where errors in the predicted probability of failure using %5 of MC data is only %0.46 in average. The approach reduced the computational time from 306 days to less than 6 hours.
    Conditional Gradients for the Approximately Vanishing Ideal. (arXiv:2202.03349v6 [cs.LG] UPDATED)
    The vanishing ideal of a set of points $X\subseteq \mathbb{R}^n$ is the set of polynomials that evaluate to $0$ over all points $\mathbf{x} \in X$ and admits an efficient representation by a finite set of polynomials called generators. To accommodate the noise in the data set, we introduce the Conditional Gradients Approximately Vanishing Ideal algorithm (CGAVI) for the construction of the set of generators of the approximately vanishing ideal. The constructed set of generators captures polynomial structures in data and gives rise to a feature map that can, for example, be used in combination with a linear classifier for supervised learning. In CGAVI, we construct the set of generators by solving specific instances of (constrained) convex optimization problems with the Pairwise Frank-Wolfe algorithm (PFW). Among other things, the constructed generators inherit the LASSO generalization bound and not only vanish on the training but also on out-sample data. Moreover, CGAVI admits a compact representation of the approximately vanishing ideal by constructing few generators with sparse coefficient vectors.
    Deep Annotation of Therapeutic Working Alliance in Psychotherapy. (arXiv:2204.05522v1 [q-bio.NC] CROSS LISTED)
    The therapeutic working alliance is an important predictor of the outcome of the psychotherapy treatment. In practice, the working alliance is estimated from a set of scoring questionnaires in an inventory that both the patient and the therapists fill out. In this work, we propose an analytical framework of directly inferring the therapeutic working alliance from the natural language within the psychotherapy sessions in a turn-level resolution with deep embeddings such as the Doc2Vec and SentenceBERT models. The transcript of each psychotherapy session can be transcribed and generated in real-time from the session speech recordings, and these embedded dialogues are compared with the distributed representations of the statements in the working alliance inventory. We demonstrate, in a real-world dataset with over 950 sessions of psychotherapy treatments in anxiety, depression, schizophrenia and suicidal patients, the effectiveness of this method in mapping out trajectories of patient-therapist alignment and the interpretability that can offer insights in clinical psychiatry. We believe such a framework can be provide timely feedback to the therapist regarding the quality of the conversation in interview sessions.
    Neural Operator with Regularity Structure for Modeling Dynamics Driven by SPDEs. (arXiv:2204.06255v1 [cs.LG])
    Stochastic partial differential equations (SPDEs) are significant tools for modeling dynamics in many areas including atmospheric sciences and physics. Neural Operators, generations of neural networks with capability of learning maps between infinite-dimensional spaces, are strong tools for solving parametric PDEs. However, they lack the ability to modeling SPDEs which usually have poor regularity due to the driving noise. As the theory of regularity structure has achieved great successes in analyzing SPDEs and provides the concept model feature vectors that well-approximate SPDEs' solutions, we propose the Neural Operator with Regularity Structure (NORS) which incorporates the feature vectors for modeling dynamics driven by SPDEs. We conduct experiments on various of SPDEs including the dynamic Phi41 model and the 2d stochastic Navier-Stokes equation, and the results demonstrate that the NORS is resolution-invariant, efficient, and achieves one order of magnitude lower error with a modest amount of data.
    Random Graph Embedding and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v1 [stat.ML])
    Multi-label learning is often used to mine the correlation between variables and multiple labels, and its research focuses on fully extracting the information between variables and labels. The $\ell_{2,1}$ regularization is often used to get a sparse coefficient matrix, but the problem of multicollinearity among variables cannot be effectively solved. In this paper, the proposed model can choose the most relevant variables by solving a joint constraint optimization problem using the $\ell_{2,1}$ regularization and Frobenius regularization. In manifold regularization, we carry out a random walk strategy based on the joint structure to construct a neighborhood graph, which is highly robust to outliers. In addition, we give an iterative algorithm of the proposed method and proved the convergence of this algorithm. The experiments on the real-world data sets also show that the comprehensive performance of our method is consistently better than the classical method.
    Is Speech Pathology a Biomarker in Automatic Speaker Verification?. (arXiv:2204.06450v1 [cs.SD])
    With the advancements in deep learning (DL) and an increasing interest in data-driven speech processing methods, a major challenge for speech data scientists in the healthcare domain is the anonymization of pathological speech, which is a required step to be able to make them accessible as a public training resource. In this paper, we investigate pathological speech data and compare their speaker verifiability with that of healthy individuals. We utilize a large pathological speech corpus of more than 2,000 test subjects with various speech and voice disorders from different ages and apply DL-based automatic speaker verification (ASV) techniques. As a result, we obtained a mean equal error rate (EER) of 0.86% with a standard deviation of 0.16%, which is a factor of three lower than comparable healthy speech databases. We further perform detailed analyses of external influencing factors on ASV such as age, pathology, recording environment, and utterance length, to explore their respective effect. Our findings indicate that speech pathology is a potential biomarker in ASV. This is potentially of high interest for the anonymization of pathological speech data.
    Flexible Multiple-Objective Reinforcement Learning for Chip Placement. (arXiv:2204.06407v1 [cs.LG])
    Recently, successful applications of reinforcement learning to chip placement have emerged. Pretrained models are necessary to improve efficiency and effectiveness. Currently, the weights of objective metrics (e.g., wirelength, congestion, and timing) are fixed during pretraining. However, fixed-weighed models cannot generate the diversity of placements required for engineers to accommodate changing requirements as they arise. This paper proposes flexible multiple-objective reinforcement learning (MORL) to support objective functions with inference-time variable weights using just a single pretrained model. Our macro placement results show that MORL can generate the Pareto frontier of multiple objectives effectively.
    Epistemic Neural Networks. (arXiv:2107.08924v2 [cs.LG] UPDATED)
    Effective decision, exploration, and adaptation often require an agent to know what it knows and, also, what it does not know. This capability relies on the quality of \textit{joint} predictions of labels assigned to multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. By assessing the quality of joint predictions it is possible to determine whether a neural network effectively distinguishes between epistemic uncertainty (that due to lack of knowledge) and aleatoric uncertainty (that due to chance). We introduce the \textit{epistemic neural network} (ENN) as a general interface for uncertainty modeling in deep learning. While prior approaches to uncertainty modeling can be viewed as ENNs, the new interface facilitates comparison of joint predictions, and the design of novel architectures and algorithms. In particular, we introduce the \textit{epinet}: an architecture that can supplement any existing neural network, including pretrained models, and trained with modest incremental computation to represent uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and sequential decision problems. As part of this effort we open-source experiment code.  ( 2 min )
    CowClip: Reducing CTR Prediction Model Training Time from 12 hours to 10 minutes on 1 GPU. (arXiv:2204.06240v1 [cs.LG])
    The click-through rate (CTR) prediction task is to predict whether a user will click on the recommended item. As mind-boggling amounts of data are produced online daily, accelerating CTR prediction model training is critical to ensuring an up-to-date model and reducing the training cost. One approach to increase the training speed is to apply large batch training. However, as shown in computer vision and natural language processing tasks, training with a large batch easily suffers from the loss of accuracy. Our experiments show that previous scaling rules fail in the training of CTR prediction neural networks. To tackle this problem, we first theoretically show that different frequencies of ids make it challenging to scale hyperparameters when scaling the batch size. To stabilize the training process in a large batch size setting, we develop the adaptive Column-wise Clipping (CowClip). It enables an easy and effective scaling rule for the embeddings, which keeps the learning rate unchanged and scales the L2 loss. We conduct extensive experiments with four CTR prediction networks on two real-world datasets and successfully scaled 128 times the original batch size without accuracy loss. In particular, for CTR prediction model DeepFM training on the Criteo dataset, our optimization framework enlarges the batch size from 1K to 128K with over 0.1% AUC improvement and reduces training time from 12 hours to 10 minutes on a single V100 GPU. Our code locates at https://github.com/zhengzangw/LargeBatchCTR.
    Large-scale multi-objective influence maximisation with network downscaling. (arXiv:2204.06250v1 [cs.SI])
    Finding the most influential nodes in a network is a computationally hard problem with several possible applications in various kinds of network-based problems. While several methods have been proposed for tackling the influence maximisation (IM) problem, their runtime typically scales poorly when the network size increases. Here, we propose an original method, based on network downscaling, that allows a multi-objective evolutionary algorithm (MOEA) to solve the IM problem on a reduced scale network, while preserving the relevant properties of the original network. The downscaled solution is then upscaled to the original network, using a mechanism based on centrality metrics such as PageRank. Our results on eight large networks (including two with $\sim$50k nodes) demonstrate the effectiveness of the proposed method with a more than 10-fold runtime gain compared to the time needed on the original network, and an up to $82\%$ time reduction compared to CELF.
    DT2CAM: A Decision Tree to Content Addressable Memory Framework. (arXiv:2204.06114v1 [cs.AR])
    Decision trees are considered one of the most powerful tools for data classification. Accelerating the decision tree search is crucial for on-the-edge applications that have limited power and latency budget. In this paper, we propose a Content Addressable Memory (CAM) Compiler for Decision Tree (DT) inference acceleration. We propose a novel "adaptive-precision" scheme that results in a compact implementation and enables an efficient bijective mapping to Ternary Content Addressable Memories while maintaining high inference accuracies. In addition, a Resistive-CAM (ReCAM) functional synthesizer is developed for mapping the decision tree to the ReCAM and performing functional simulations for energy, latency, and accuracy evaluations. We study the decision tree accuracy under hardware non-idealities including device defects, manufacturing variability, and input encoding noise. We test our framework on various DT datasets including \textit{Give Me Some Credit}, \textit{Titanic}, and \textit{COVID-19}. Our results reveal up to {42.4\%} energy savings and up to 17.8x better energy-delay-area product compared to the state-of-art hardware accelerators, and up to 333 million decisions per sec for the pipelined implementation.
    Learnable Hypergraph Laplacian for Hypergraph Learning. (arXiv:2106.06666v2 [cs.LG] UPDATED)
    Hypergraph Convolutional Neural Networks (HGCNNs) have demonstrated their potential in modeling high-order relations preserved in graph-structured data. However, most existing convolution filters are localized and determined by the pre-defined initial hypergraph topology, neglecting to explore implicit and long-range relations in real-world data. In this paper, we propose the first learning-based method tailored for constructing adaptive hypergraph structure, termed HypERgrAph Laplacian aDaptor (HERALD), which serves as a generic plug-and-play module for improving the representational power of HGCNNs.Specifically, HERALD adaptively optimizes the adjacency relationship between vertices and hyperedges in an end-to-end manner and thus the task-aware hypergraph is learned. Furthermore, HERALD employs the self-attention mechanism to capture the non-local paired-nodes relation. Extensive experiments on various popular hypergraph datasets for node classification and graph classification tasks demonstrate that our approach obtains consistent and considerable performance enhancement, proving its effectiveness and generalization ability.
    InCoder: A Generative Model for Code Infilling and Synthesis. (arXiv:2204.05999v1 [cs.SE])
    Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and moved to the end of each file, allowing code infilling with bidirectional context. Our model is the first generative model that is able to directly perform zero-shot code infilling, which we evaluate on challenging tasks such as type inference, comment generation, and variable re-naming. We find that the ability to condition on bidirectional context substantially improves performance on these tasks, while still performing comparably on standard program synthesis benchmarks in comparison to left-to-right only models pretrained at similar scale. The InCoder models and code are publicly released. https://sites.google.com/view/incoder-code-models
    An Analysis on Ensemble Learning optimized Medical Image Classification with Deep Convolutional Neural Networks. (arXiv:2201.11440v2 [cs.CV] UPDATED)
    Novel and high-performance medical image classification pipelines are heavily utilizing ensemble learning strategies. The idea of ensemble learning is to assemble diverse models or multiple predictions and, thus, boost prediction performance. However, it is still an open question to what extent as well as which ensemble learning strategies are beneficial in deep learning based medical image classification pipelines. In this work, we proposed a reproducible medical image classification pipeline for analyzing the performance impact of the following ensemble learning techniques: Augmenting, Stacking, and Bagging. The pipeline consists of state-of-the-art preprocessing and image augmentation methods as well as 9 deep convolution neural network architectures. It was applied on four popular medical imaging datasets with varying complexity. Furthermore, 12 pooling functions for combining multiple predictions were analyzed, ranging from simple statistical functions like unweighted averaging up to more complex learning-based functions like support vector machines. Our results revealed that Stacking achieved the largest performance gain of up to 13% F1-score increase. Augmenting showed consistent improvement capabilities by up to 4% and is also applicable to single model based pipelines. Cross-validation based Bagging demonstrated significant performance gain close to Stacking, which resulted in an F1-score increase up to +11%. Furthermore, we demonstrated that simple statistical pooling functions are equal or often even better than more complex pooling functions. We concluded that the integration of ensemble learning techniques is a powerful method for any medical image classification pipeline to improve robustness and boost performance.
    Decentralized Collaborative Learning Framework for Next POI Recommendation. (arXiv:2204.06516v1 [cs.IR])
    Next Point-of-Interest (POI) recommendation has become an indispensable functionality in Location-based Social Networks (LBSNs) due to its effectiveness in helping people decide the next POI to visit. However, accurate recommendation requires a vast amount of historical check-in data, thus threatening user privacy as the location-sensitive data needs to be handled by cloud servers. Although there have been several on-device frameworks for privacy-preserving POI recommendations, they are still resource-intensive when it comes to storage and computation, and show limited robustness to the high sparsity of user-POI interactions. On this basis, we propose a novel decentralized collaborative learning framework for POI recommendation (DCLR), which allows users to train their personalized models locally in a collaborative manner. DCLR significantly reduces the local models' dependence on the cloud for training, and can be used to expand arbitrary centralized recommendation models. To counteract the sparsity of on-device user data when learning each local model, we design two self-supervision signals to pretrain the POI representations on the server with geographical and categorical correlations of POIs. To facilitate collaborative learning, we innovatively propose to incorporate knowledge from either geographically or semantically similar users into each local model with attentive aggregation and mutual information maximization. The collaborative learning process makes use of communications between devices while requiring only minor engagement from the central server for identifying user groups, and is compatible with common privacy preservation mechanisms like differential privacy.  ( 2 min )
    Negative Sampling for Recommendation. (arXiv:2204.06520v1 [cs.IR])
    How to effectively sample high-quality negative instances is important for well training a recommendation model. We argue that a high-quality negative should be both \textit{informativeness} and \textit{unbiasedness}. Although previous studies have proposed some approaches to address the informativeness in negative sampling, few has been done to discriminating false negative from true negative for unbiased negative sampling, not to mention taking both into consideration. This paper first adopts a parameter learning perspective to analyze negative informativeness and unbiasedness in loss gradient-based model training. We argue that both negative sampling and collaborative filtering include an implicit task of negative classification, from which we report an insightful yet beneficial finding about the order relation in predicted negatives' scores. Based on our finding and by regarding negatives as random variables, we next derive the class condition density of true negatives and that of false negatives. We also design a Bayesian classifier for negative classification, from which we define a quantitative unbiasedness measure for negatives. Finally, we propose to use a harmonic mean of informativeness and unbiasedness to sample high-quality negatives. Experimental studies validate the superiority of our negative sampling algorithm over the peers in terms of better sampling quality and better recommendation performance.  ( 2 min )
    VisCUIT: Visual Auditor for Bias in CNN Image Classifier. (arXiv:2204.05899v2 [cs.CV] UPDATED)
    CNN image classifiers are widely used, thanks to their efficiency and accuracy. However, they can suffer from biases that impede their practical applications. Most existing bias investigation techniques are either inapplicable to general image classification tasks or require significant user efforts in perusing all data subgroups to manually specify which data attributes to inspect. We present VisCUIT, an interactive visualization system that reveals how and why a CNN classifier is biased. VisCUIT visually summarizes the subgroups on which the classifier underperforms and helps users discover and characterize the cause of the underperformances by revealing image concepts responsible for activating neurons that contribute to misclassifications. VisCUIT runs in modern browsers and is open-source, allowing people to easily access and extend the tool to other model architectures and datasets. VisCUIT is available at the following public demo link: https://poloclub.github.io/VisCUIT. A video demo is available at https://youtu.be/eNDbSyM4R_4.  ( 2 min )
    Coverage and Capacity Optimization in STAR-RISs Assisted Networks: A Machine Learning Approach. (arXiv:2204.06390v1 [cs.IT])
    Coverage and capacity are the important metrics for performance evaluation in wireless networks, while the coverage and capacity have several conflicting relationships, e.g. high transmit power contributes to large coverage but high inter-cell interference reduces the capacity performance. Therefore, in order to strike a balance between the coverage and capacity, a novel model is proposed for the coverage and capacity optimization of simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RISs) assisted networks. To solve the coverage and capacity optimization (CCO) problem, a machine learning-based multi-objective optimization algorithm, i.e., the multi-objective proximal policy optimization (MO-PPO) algorithm, is proposed. In this algorithm, a loss function-based update strategy is the core point, which is able to calculate weights for both loss functions of coverage and capacity by a min-norm solver at each update. The numerical results demonstrate that the investigated update strategy outperforms the fixed weight-based MO algorithms.  ( 2 min )
    GenIE: Generative Information Extraction. (arXiv:2112.08340v3 [cs.CL] UPDATED)
    Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code, data and models available at https://github.com/epfl-dlab/GenIE.  ( 2 min )
    Efficient Non-parametric Bayesian Hawkes Processes. (arXiv:1810.03730v5 [cs.LG] UPDATED)
    In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.  ( 2 min )
    Automatic Multi-Label Prompting: Simple and Interpretable Few-Shot Classification. (arXiv:2204.06305v1 [cs.CL])
    Prompt-based learning (i.e., prompting) is an emerging paradigm for exploiting knowledge learned by a pretrained language model. In this paper, we propose Automatic Multi-Label Prompting (AMuLaP), a simple yet effective method to automatically select label mappings for few-shot text classification with prompting. Our method exploits one-to-many label mappings and a statistics-based algorithm to select label mappings given a prompt template. Our experiments demonstrate that AMuLaP achieves competitive performance on the GLUE benchmark without human effort or external resources.  ( 2 min )
    Active Diffusion and VCA-Assisted Image Segmentation of Hyperspectral Images. (arXiv:2204.06298v1 [cs.CV])
    Hyperspectral images encode rich structure that can be exploited for material discrimination by machine learning algorithms. This article introduces the Active Diffusion and VCA-Assisted Image Segmentation (ADVIS) for active material discrimination. ADVIS selects high-purity, high-density pixels that are far in diffusion distance (a data-dependent metric) from other high-purity, high-density pixels in the hyperspectral image. The ground truth labels of these pixels are queried and propagated to the rest of the image. The ADVIS active learning algorithm is shown to strongly outperform its fully unsupervised clustering algorithm counterpart, suggesting that the incorporation of a very small number of carefully-selected ground truth labels can result in substantially superior material discrimination in hyperspectral images.  ( 2 min )
    Receding Neuron Importances for Structured Pruning. (arXiv:2204.06404v1 [cs.LG])
    Structured pruning efficiently compresses networks by identifying and removing unimportant neurons. While this can be elegantly achieved by applying sparsity-inducing regularisation on BatchNorm parameters, an L1 penalty would shrink all scaling factors rather than just those of superfluous neurons. To tackle this issue, we introduce a simple BatchNorm variation with bounded scaling parameters, based on which we design a novel regularisation term that suppresses only neurons with low importance. Under our method, the weights of unnecessary neurons effectively recede, producing a polarised bimodal distribution of importances. We show that neural networks trained this way can be pruned to a larger extent and with less deterioration. We one-shot prune VGG and ResNet architectures at different ratios on CIFAR and ImagenNet datasets. In the case of VGG-style networks, our method significantly outperforms existing approaches particularly under a severe pruning regime.  ( 2 min )
    The Exponentially Tilted Gaussian Prior for Variational Autoencoders. (arXiv:2111.15646v3 [cs.LG] UPDATED)
    An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.  ( 2 min )
    L3Cube-MahaNER: A Marathi Named Entity Recognition Dataset and BERT models. (arXiv:2204.06029v1 [cs.CL])
    Named Entity Recognition (NER) is a basic NLP task and finds major applications in conversational and search systems. It helps us identify key entities in a sentence used for the downstream application. NER or similar slot filling systems for popular languages have been heavily used in commercial applications. In this work, we focus on Marathi, an Indian language, spoken prominently by the people of Maharashtra state. Marathi is a low resource language and still lacks useful NER resources. We present L3Cube-MahaNER, the first major gold standard named entity recognition dataset in Marathi. We also describe the manual annotation guidelines followed during the process. In the end, we benchmark the dataset on different CNN, LSTM, and Transformer based models like mBERT, XLM-RoBERTa, IndicBERT, MahaBERT, etc. The MahaBERT provides the best performance among all the models. The data and models are available at https://github.com/l3cube-pune/MarathiNLP .
    Prediction of motor insurance claims occurrence as an imbalanced machine learning problem. (arXiv:2204.06109v1 [q-fin.ST])
    The insurance industry, with its large datasets, is a natural place to use big data solutions. However it must be stressed, that significant number of applications for machine learning in insurance industry, like fraud detection or claim prediction, deals with the problem of machine learning on an imbalanced data set. This is due to the fact that frauds or claims are rare events when compared with the entire population of drivers. The problem of imbalanced learning is often hard to overcome. Therefore, the main goal of this work is to present and apply various methods of dealing with an imbalanced dataset in the context of claim occurrence prediction in car insurance. In addition, the above techniques are used to compare the results of machine learning algorithms in the context of claim occurrence prediction in car insurance. Our study covers the following techniques: logistic-regression, decision tree, random forest, xgBoost, feed-forward network. The problem is the classification one.
    Context-based Deep Learning Architecture with Optimal Integration Layer for Image Parsing. (arXiv:2204.06214v1 [cs.CV])
    Deep learning models have been efficient lately on image parsing tasks. However, deep learning models are not fully capable of exploiting visual and contextual information simultaneously. The proposed three-layer context-based deep architecture is capable of integrating context explicitly with visual information. The novel idea here is to have a visual layer to learn visual characteristics from binary class-based learners, a contextual layer to learn context, and then an integration layer to learn from both via genetic algorithm-based optimal fusion to produce a final decision. The experimental outcomes when evaluated on benchmark datasets are promising. Further analysis shows that optimized network weights can improve performance and make stable predictions.
    Deep Learning for Effective and Efficient Reduction of Large Adaptation Spaces in Self-Adaptive Systems. (arXiv:2204.06254v1 [cs.SE])
    Many software systems today face uncertain operating conditions, such as sudden changes in the availability of resources or unexpected user behavior. Without proper mitigation these uncertainties can jeopardize the system goals. Self-adaptation is a common approach to tackle such uncertainties. When the system goals may be compromised, the self-adaptive system has to select the best adaptation option to reconfigure by analyzing the possible adaptation options, i.e., the adaptation space. Yet, analyzing large adaptation spaces using rigorous methods can be resource- and time-consuming, or even be infeasible. One approach to tackle this problem is by using online machine learning to reduce adaptation spaces. However, existing approaches require domain expertise to perform feature engineering to define the learner, and support online adaptation space reduction only for specific goals. To tackle these limitations, we present 'Deep Learning for Adaptation Space Reduction Plus' -- DLASeR+ in short. DLASeR+ offers an extendable learning framework for online adaptation space reduction that does not require feature engineering, while supporting three common types of adaptation goals: threshold, optimization, and set-point goals. We evaluate DLASeR+ on two instances of an Internet-of-Things application with increasing sizes of adaptation spaces for different combinations of adaptation goals. We compare DLASeR+ with a baseline that applies exhaustive analysis and two state-of-the-art approaches for adaptation space reduction that rely on learning. Results show that DLASeR+ is effective with a negligible effect on the realization of the adaptation goals compared to an exhaustive analysis approach, and supports three common types of adaptation goals beyond the state-of-the-art approaches.
    Experimental Standards for Deep Learning Research: A Natural Language Processing Perspective. (arXiv:2204.06251v1 [cs.LG])
    The field of Deep Learning (DL) has undergone explosive growth during the last decade, with a substantial impact on Natural Language Processing (NLP) as well. Yet, as with other fields employing DL techniques, there has been a lack of common experimental standards compared to more established disciplines. Starting from fundamental scientific principles, we distill ongoing discussions on experimental standards in DL into a single, widely-applicable methodology. Following these best practices is crucial to strengthening experimental evidence, improve reproducibility and enable scientific progress. These standards are further collected in a public repository to help them transparently adapt to future needs.
  • Open

    Enabling Synthetic Data adoption in regulated domains. (arXiv:2204.06297v1 [cs.LG])
    The switch from a Model-Centric to a Data-Centric mindset is putting emphasis on data and its quality rather than algorithms, bringing forward new challenges. In particular, the sensitive nature of the information in highly regulated scenarios needs to be accounted for. Specific approaches to address the privacy issue have been developed, as Privacy Enhancing Technologies. However, they frequently cause loss of information, putting forward a crucial trade-off among data quality and privacy. A clever way to bypass such a conundrum relies on Synthetic Data: data obtained from a generative process, learning the real data properties. Both Academia and Industry realized the importance of evaluating synthetic data quality: without all-round reliable metrics, the innovative data generation task has no proper objective function to maximize. Despite that, the topic remains under-explored. For this reason, we systematically catalog the important traits of synthetic data quality and privacy, and devise a specific methodology to test them. The result is DAISYnt (aDoption of Artificial Intelligence SYnthesis): a comprehensive suite of advanced tests, which sets a de facto standard for synthetic data evaluation. As a practical use-case, a variety of generative algorithms have been trained on real-world Credit Bureau Data. The best model has been assessed, using DAISYnt on the different synthetic replicas. Further potential uses, among others, entail auditing and fine-tuning of generative models or ensuring high quality of a given synthetic dataset. From a prescriptive viewpoint, eventually, DAISYnt may pave the way to synthetic data adoption in highly regulated domains, ranging from Finance to Healthcare, through Insurance and Education.
    Data-heterogeneity-aware Mixing for Decentralized Learning. (arXiv:2204.06477v1 [cs.LG])
    Decentralized learning provides an effective framework to train machine learning models with data distributed over arbitrary communication graphs. However, most existing approaches toward decentralized learning disregard the interaction between data heterogeneity and graph topology. In this paper, we characterize the dependence of convergence on the relationship between the mixing weights of the graph and the data heterogeneity across nodes. We propose a metric that quantifies the ability of a graph to mix the current gradients. We further prove that the metric controls the convergence rate, particularly in settings where the heterogeneity across nodes dominates the stochasticity between updates for a given node. Motivated by our analysis, we propose an approach that periodically and efficiently optimizes the metric using standard convex constrained optimization and sketching techniques. Through comprehensive experiments on standard computer vision and NLP benchmarks, we show that our approach leads to improvement in test performance for a wide range of tasks.
    Features of the Earth's seasonal hydroclimate: Characterizations and comparisons across the Koppen-Geiger climates and across continents. (arXiv:2204.06544v1 [stat.AP])
    Detailed feature investigations and comparisons across climates, continents and time series types can progress our understanding and modelling ability of the Earth's hydroclimate and its dynamics. As a step towards these important directions, we here propose and extensively apply a multifaceted and engineering-friendly methodological framework for the thorough characterization of seasonal hydroclimatic dependence, variability and change at the global scale. We apply this framework using over 13 000 quarterly temperature, precipitation and river flow time series. In these time series, the seasonal hydroclimatic behaviour is represented by 3-month means of earth-observed variables. In our analyses, we also adopt the well-established Koppen-Geiger climate classification system and define continental-scale regions with large or medium density of observational stations. In this context, we provide in parallel seasonal hydroclimatic feature summaries and comparisons in terms of autocorrelation, seasonality, temporal variation, entropy, long-range dependence and trends. We find notable differences to characterize the magnitudes of most of these features across the various Koppen-Geiger climate classes, as well as between several continental-scale geographical regions. We, therefore, deem that the consideration of the comparative summaries could be more beneficial in water resources engineering contexts than the also provided global summaries. Lastly, we apply explainable machine learning to compare the investigated features with respect to how informative they are in explaining and predicting either the main Koppen-Geiger climate or the continental-scale region, with the entropy, long-range dependence and trend features being (roughly) found to be less informative than the remaining ones at the seasonal time scale.
    A Statistical Learning View of Simple Kriging. (arXiv:2202.07365v3 [stat.ML] UPDATED)
    In the Big Data era, with the ubiquity of geolocation sensors in particular, massive datasets exhibiting a possibly complex spatial dependence structure are becoming increasingly available. In this context, the standard probabilistic theory of statistical learning does not apply directly and guarantees of the generalization capacity of predictive rules learned from such data are left to establish. We analyze here the simple Kriging task, the flagship problem in Geostatistics: the values of a square integrable random field $X=\{X_s\}_{s\in S}$, $S\subset \mathbb{R}^2$, with unknown covariance structure are to be predicted with minimum quadratic risk, based upon observing a single realization of the spatial process at a finite number of locations $s_1,\; \ldots,\; s_n$ in $S$. Despite the connection of this minimization problem with kernel ridge regression, establishing the generalization capacity of empirical risk minimizers is far from straightforward, due to the non i.i.d. nature of the spatial data $X_{s_1},\; \ldots,\; X_{s_n}$ involved. In this article, nonasymptotic bounds of order $O_{\mathbb{P}}(1/n)$ are proved for the excess risk of a plug-in predictive rule mimicking the true minimizer in the case of isotropic stationary Gaussian processes observed at locations forming a regular grid. These theoretical results, as well as the role played by the technical conditions required to establish them, are illustrated by various numerical experiments and hopefully pave the way for further developments in statistical learning based on spatial data.
    Towards Practical Robustness Analysis for DNNs based on PAC-Model Learning. (arXiv:2101.10102v2 [cs.LG] UPDATED)
    To analyse local robustness properties of deep neural networks (DNNs), we present a practical framework from a model learning perspective. Based on black-box model learning with scenario optimisation, we abstract the local behaviour of a DNN via an affine model with the probably approximately correct (PAC) guarantee. From the learned model, we can infer the corresponding PAC-model robustness property. The innovation of our work is the integration of model learning into PAC robustness analysis: that is, we construct a PAC guarantee on the model level instead of sample distribution, which induces a more faithful and accurate robustness evaluation. This is in contrast to existing statistical methods without model learning. We implement our method in a prototypical tool named DeepPAC. As a black-box method, DeepPAC is scalable and efficient, especially when DNNs have complex structures or high-dimensional inputs. We extensively evaluate DeepPAC, with 4 baselines (using formal verification, statistical methods, testing and adversarial attack) and 20 DNN models across 3 datasets, including MNIST, CIFAR-10, and ImageNet. It is shown that DeepPAC outperforms the state-of-the-art statistical method PROVERO, and it achieves more practical robustness analysis than the formal verification tool ERAN. Also, its results are consistent with existing DNN testing work like DeepGini.
    Online greedy identification of linear dynamical systems. (arXiv:2204.06375v1 [stat.ML])
    This work addresses the problem of exploration in an unknown environment. For linear dynamical systems, we use an experimental design framework and introduce an online greedy policy where the control maximizes the information of the next step. In a setting with a limited number of experimental trials, our algorithm has low complexity and shows experimentally competitive performances compared to more elaborate gradient-based methods.
    Overparameterized Linear Regression under Adversarial Attacks. (arXiv:2204.06274v1 [stat.ML])
    As machine learning models start to be used in critical applications, their vulnerabilities and brittleness become a pressing concern. Adversarial attacks are a popular framework for studying these vulnerabilities. In this work, we study the error of linear regression in the face of adversarial attacks. We provide bounds of the error in terms of the traditional risk and the parameter norm and show how these bounds can be leveraged and make it possible to use analysis from non-adversarial setups to study the adversarial risk. The usefulness of these results is illustrated by shedding light on whether or not overparameterized linear models can be adversarially robust. We show that adding features to linear models might be either a source of additional robustness or brittleness. We show that these differences appear due to scaling and how the $\ell_1$ and $\ell_2$ norms of random projections concentrate. We also show how the reformulation we propose allows for solving adversarial training as a convex optimization problem. This is then used as a tool to study how adversarial training and other regularization methods might affect the robustness of the estimated models.
    The Exponentially Tilted Gaussian Prior for Variational Autoencoders. (arXiv:2111.15646v3 [cs.LG] UPDATED)
    An important property for deep neural networks is the ability to perform robust out-of-distribution detection on previously unseen data. This property is essential for safety purposes when deploying models for real world applications. Recent studies show that probabilistic generative models can perform poorly on this task, which is surprising given that they seek to estimate the likelihood of training data. To alleviate this issue, we propose the exponentially tilted Gaussian prior distribution for the Variational Autoencoder (VAE) which pulls points onto the surface of a hyper-sphere in latent space. This achieves state-of-the art results on the area under the curve-receiver operator characteristics metric using just the log-likelihood that the VAE naturally assigns. Because this prior is a simple modification of the traditional VAE prior, it is faster and easier to implement than competitive methods.
    The sparse Polynomial Chaos expansion: a fully Bayesian approach with joint priors on the coefficients and global selection of terms. (arXiv:2204.06043v1 [stat.CO])
    Polynomial chaos expansion (PCE) is a versatile tool widely used in uncertainty quantification and machine learning, but its successful application depends strongly on the accuracy and reliability of the resulting PCE-based response surface. High accuracy typically requires high polynomial degrees, demanding many training points especially in high-dimensional problems through the curse of dimensionality. So-called sparse PCE concepts work with a much smaller selection of basis polynomials compared to conventional PCE approaches and can overcome the curse of dimensionality very efficiently, but have to pay specific attention to their strategies of choosing training points. Furthermore, the approximation error resembles an uncertainty that most existing PCE-based methods do not estimate. In this study, we develop and evaluate a fully Bayesian approach to establish the PCE representation via joint shrinkage priors and Markov chain Monte Carlo. The suggested Bayesian PCE model directly aims to solve the two challenges named above: achieving a sparse PCE representation and estimating uncertainty of the PCE itself. The embedded Bayesian regularizing via the joint shrinkage prior allows using higher polynomial degrees for given training points due to its ability to handle underdetermined situations, where the number of considered PCE coefficients could be much larger than the number of available training points. We also explore multiple variable selection methods to construct sparse PCE expansions based on the established Bayesian representations, while globally selecting the most meaningful orthonormal polynomials given the available training data. We demonstrate the advantages of our Bayesian PCE and the corresponding sparsity-inducing methods on several benchmarks.
    A quantum generative model for multi-dimensional time series using Hamiltonian learning. (arXiv:2204.06150v1 [quant-ph])
    Synthetic data generation has proven to be a promising solution for addressing data availability issues in various domains. Even more challenging is the generation of synthetic time series data, where one has to preserve temporal dynamics, i.e., the generated time series must respect the original relationships between variables across time. Recently proposed techniques such as generative adversarial networks (GANs) and quantum-GANs lack the ability to attend to the time series specific temporal correlations adequately. We propose using the inherent nature of quantum computers to simulate quantum dynamics as a technique to encode such features. We start by assuming that a given time series can be generated by a quantum process, after which we proceed to learn that quantum process using quantum machine learning. We then use the learned model to generate out-of-sample time series and show that it captures unique and complex features of the learned time series. We also study the class of time series that can be modeled using this technique. Finally, we experimentally demonstrate the proposed algorithm on an 11-qubit trapped-ion quantum machine.
    Generalization Error Bounds for Multiclass Sparse Linear Classifiers. (arXiv:2204.06264v1 [math.ST])
    We consider high-dimensional multiclass classification by sparse multinomial logistic regression. Unlike binary classification, in the multiclass setup one can think about an entire spectrum of possible notions of sparsity associated with different structural assumptions on the regression coefficients matrix. We propose a computationally feasible feature selection procedure based on penalized maximum likelihood with convex penalties capturing a specific type of sparsity at hand. In particular, we consider global sparsity, double row-wise sparsity, and low-rank sparsity, and show that with the properly chosen tuning parameters the derived plug-in classifiers attain the minimax generalization error bounds (in terms of misclassification excess risk) within the corresponding classes of multiclass sparse linear classifiers. The developed approach is general and can be adapted to other types of sparsity as well.
    Epistemic Neural Networks. (arXiv:2107.08924v2 [cs.LG] UPDATED)
    Effective decision, exploration, and adaptation often require an agent to know what it knows and, also, what it does not know. This capability relies on the quality of \textit{joint} predictions of labels assigned to multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. By assessing the quality of joint predictions it is possible to determine whether a neural network effectively distinguishes between epistemic uncertainty (that due to lack of knowledge) and aleatoric uncertainty (that due to chance). We introduce the \textit{epistemic neural network} (ENN) as a general interface for uncertainty modeling in deep learning. While prior approaches to uncertainty modeling can be viewed as ENNs, the new interface facilitates comparison of joint predictions, and the design of novel architectures and algorithms. In particular, we introduce the \textit{epinet}: an architecture that can supplement any existing neural network, including pretrained models, and trained with modest incremental computation to represent uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and sequential decision problems. As part of this effort we open-source experiment code.
    Approximate Bayesian Computation via Classification. (arXiv:2111.11507v3 [stat.ME] UPDATED)
    Approximate Bayesian Computation (ABC) enables statistical inference in simulator-based models whose likelihoods are difficult to calculate but easy to simulate from. ABC constructs a kernel-type approximation to the posterior distribution through an accept/reject mechanism which compares summary statistics of real and simulated data. To obviate the need for summary statistics, we directly compare empirical distributions with a Kullback-Leibler (KL) divergence estimator obtained via contrastive learning. In particular, we blend flexible machine learning classifiers within ABC to automate fake/real data comparisons. We consider the traditional accept/reject kernel as well as an exponential weighting scheme which does not require the ABC acceptance threshold. Our theoretical results show that the rate at which our ABC posterior distributions concentrate around the true parameter depends on the estimation error of the classifier. We derive limiting posterior shape results and find that, with a properly scaled exponential kernel, asymptotic normality holds. We demonstrate the usefulness of our approach on simulated examples as well as real data in the context of stock volatility estimation.
    Random Graph Embedding and Joint Sparse Regularization for Multi-label Feature Selection. (arXiv:2204.06445v1 [stat.ML])
    Multi-label learning is often used to mine the correlation between variables and multiple labels, and its research focuses on fully extracting the information between variables and labels. The $\ell_{2,1}$ regularization is often used to get a sparse coefficient matrix, but the problem of multicollinearity among variables cannot be effectively solved. In this paper, the proposed model can choose the most relevant variables by solving a joint constraint optimization problem using the $\ell_{2,1}$ regularization and Frobenius regularization. In manifold regularization, we carry out a random walk strategy based on the joint structure to construct a neighborhood graph, which is highly robust to outliers. In addition, we give an iterative algorithm of the proposed method and proved the convergence of this algorithm. The experiments on the real-world data sets also show that the comprehensive performance of our method is consistently better than the classical method.
    Discovering Diverse Solutions in Deep Reinforcement Learning by Maximizing State-Action-Based Mutual Information. (arXiv:2103.07084v2 [stat.ML] UPDATED)
    Reinforcement learning algorithms are typically limited to learning a single solution for a specified task, even though diverse solutions often exist. Recent studies showed that learning a set of diverse solutions is beneficial because diversity enables robust few-shot adaptation. Although existing methods learn diverse solutions by using the mutual information as unsupervised rewards, such an approach often suffers from the bias of the gradient estimator induced by value function approximation. In this study, we propose a novel method that can learn diverse solutions without suffering the bias problem. In our method, a policy conditioned on a continuous or discrete latent variable is trained by directly maximizing the variational lower bound of the mutual information, instead of using the mutual information as unsupervised rewards as in previous studies. Through extensive experiments on robot locomotion tasks, we demonstrate that the proposed method successfully learns an infinite set of diverse solutions by learning continuous latent variables, which is more challenging than learning a finite number of solutions. Subsequently, we show that our method enables more effective few-shot adaptation compared with existing methods.
    Utilizing variational autoencoders in the Bayesian inverse problem of photoacoustic tomography. (arXiv:2204.06270v1 [physics.comp-ph])
    There has been an increasing interest in utilizing machine learning methods in inverse problems and imaging. Most of the work has, however, concentrated on image reconstruction problems, and the number of studies regarding the full solution of the inverse problem is limited. In this work, we study a machine learning based approach for the Bayesian inverse problem of photoacoustic tomography. We develop an approach for estimating the posterior distribution in photoacoustic tomography using an approach based on the variational autoencoder. The approach is evaluated with numerical simulations and compared to the solution of the inverse problem using a Bayesian approach.
    Time-uniform central limit theory with applications to anytime-valid causal inference. (arXiv:2103.06476v3 [math.ST] UPDATED)
    This work introduces time-uniform analogues of confidence intervals based on the central limit theorem (CLT). Our methods take the form of confidence sequences (CS) -- sequences of confidence intervals that are uniformly valid over time. CSs provide valid inference at arbitrary stopping times, incurring no penalties for "peeking" at the data, unlike classical confidence intervals which require the sample size to be fixed in advance. Existing CSs in the literature are nonasymptotic, requiring strong assumptions on the data, while the classical (fixed-time) CLT is ubiquitous due to the weak assumptions it imposes. Our work bridges the gap by introducing time-uniform CSs that only require CLT-like assumptions. While the CLT approximates the distribution of a sample average by that of a Gaussian at a fixed sample size, we use strong invariance principles like the seminal work of Koml\'os, Major, and Tusn\'ady to uniformly approximate the entire sample average process by an implicit Brownian motion. Applying Robbins' normal mixture martingale method to this Brownian motion then yields closed-form time-uniform boundaries. We combine these boundaries with doubly robust estimators to derive nonparametric CSs for the average treatment effect (and other causal estimands). These allow randomized experiments and observational studies to be continuously monitored and adaptively stopped, all while controlling the type-I error.
    Sigma-Delta and Distributed Noise-Shaping Quantization Methods for Random Fourier Features. (arXiv:2106.02614v2 [cs.LG] UPDATED)
    We propose the use of low bit-depth Sigma-Delta and distributed noise-shaping methods for quantizing the Random Fourier features (RFFs) associated with shift-invariant kernels. We prove that our quantized RFFs -- even in the case of $1$-bit quantization -- allow a high accuracy approximation of the underlying kernels, and the approximation error decays at least polynomially fast as the dimension of the RFFs increases. We also show that the quantized RFFs can be further compressed, yielding an excellent trade-off between memory use and accuracy. Namely, the approximation error now decays exponentially as a function of the bits used. Moreover, we empirically show by testing the performance of our methods on several machine learning tasks that our method compares favorably to other state of the art quantization methods in this context.
    Approximating Continuous Functions on Persistence Diagrams Using Template Functions. (arXiv:1902.07190v3 [cs.CG] UPDATED)
    The persistence diagram is an increasingly useful tool from Topological Data Analysis, but its use alongside typical machine learning techniques requires mathematical finesse. The most success to date has come from methods that map persistence diagrams into vector spaces, in a way which maximizes the structure preserved. This process is commonly referred to as featurization. In this paper, we describe a mathematical framework for featurization called \emph{template functions}, and we show that it addresses the problem of approximating continuous functions on compact subsets of the space of persistence diagrams. Specifically, we begin by characterizing relative compactness with respect to the bottleneck distance, and then provide explicit theoretical methods for constructing compact-open dense subsets of continuous functions on persistence diagrams. These dense subsets -- obtained via template functions -- are leveraged for supervised learning tasks with persistence diagrams. Specifically, we test the method for classification and regression algorithms on several examples including shape data and dynamical systems.
    Deep Probabilistic Time Series Forecasting using Augmented Recurrent Input for Dynamic Systems. (arXiv:2106.05848v2 [cs.LG] UPDATED)
    The demand of probabilistic time series forecasting has been recently raised in various dynamic system scenarios, for example, system identification and prognostic and health management of machines. To this end, we combine the advances in both deep generative models and state space model (SSM) to come up with a novel, data-driven deep probabilistic sequence model. Specifically, we follow the popular encoder-decoder generative structure to build the recurrent neural networks (RNN) assisted variational sequence model on an augmented recurrent input space, which could induce rich stochastic sequence dependency. Besides, in order to alleviate the inconsistency issue of the posterior between training and predicting as well as improving the mining of dynamic patterns, we (i) propose using a lagged hybrid output as input for the posterior at next time step, which brings training and predicting into alignment; and (ii) further devise a generalized auto-regressive strategy that encodes all the historical dependencies for the posterior. Thereafter, we first investigate the methodological characteristics of the proposed deep probabilistic sequence model on toy cases, and then comprehensively demonstrate the superiority of our model against existing deep probabilistic SSM models through extensive numerical experiments on eight system identification benchmarks from various dynamic systems. Finally, we apply our sequence model to a real-world centrifugal compressor forecasting problem, and again verify its outstanding performance by quantifying the time series predictive distribution.
    Efficient Non-parametric Bayesian Hawkes Processes. (arXiv:1810.03730v5 [cs.LG] UPDATED)
    In this paper, we develop an efficient nonparametric Bayesian estimation of the kernel function of Hawkes processes. The non-parametric Bayesian approach is important because it provides flexible Hawkes kernels and quantifies their uncertainty. Our method is based on the cluster representation of Hawkes processes. Utilizing the finite support assumption of the Hawkes process, we efficiently sample random branching structures and thus, we split the Hawkes process into clusters of Poisson processes. We derive two algorithms -- a block Gibbs sampler and a maximum a posteriori estimator based on expectation maximization -- and we show that our methods have a linear time complexity, both theoretically and empirically. On synthetic data, we show our methods to be able to infer flexible Hawkes triggering kernels. On two large-scale Twitter diffusion datasets, we show that our methods outperform the current state-of-the-art in goodness-of-fit and that the time complexity is linear in the size of the dataset. We also observe that on diffusions related to online videos, the learned kernels reflect the perceived longevity for different content types such as music or pets videos.
    Encoding Domain Knowledge in Multi-view Latent Variable Models: A Bayesian Approach with Structured Sparsity. (arXiv:2204.06242v1 [stat.ML])
    Many real-world systems are described not only by data from a single source but via multiple data views. For example, in genomic medicine, a patient can be described by data from different molecular layers. This raises the need for multi-view models that are able to disentangle variation within and across data views in an interpretable manner. Latent variable models with structured sparsity are a commonly used tool to address this modeling task but interpretability is cumbersome since it requires a direct inspection and interpretation of each factor via a specialized domain expert. Here, we propose MuVI, a novel approach for domain-informed multi-view latent variable models, facilitating the analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) is able to integrate noisy domain expertise in form of feature sets, (ii) is robust to noise in the encoded domain knowledge, (iii) results in identifiable factors and (iv) is able to infer interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
    Time series features for supporting hydrometeorological explorations and predictions in ungauged locations using large datasets. (arXiv:2204.06540v1 [stat.ME])
    Regression-based frameworks for streamflow regionalization are built around catchment attributes that traditionally originate from catchment hydrology, flood frequency analysis and their interplay. In this work, we deviated from this traditional path by formulating and extensively investigating the first regression-based streamflow regionalization frameworks that largely emerge from general-purpose time series features for data science and, more precisely, from a large variety of such features. We focused on 28 features that included (partial) autocorrelation, entropy, temporal variation, seasonality, trend, lumpiness, stability, nonlinearity, linearity, spikiness, curvature and others. We estimated these features for daily temperature, precipitation and streamflow time series from 511 catchments, and then merged them within regionalization contexts with traditional topographic, land cover, soil and geologic attributes. Precipitation and temperature features (e.g., the spectral entropy, seasonality strength and lag-1 autocorrelation of the precipitation time series, and the stability and trend strength of the temperature time series) were found to be useful predictors of many streamflow features. The same applies to traditional attributes, such as the catchment mean elevation. Relationships between predictor and dependent variables were also revealed, while the spectral entropy, the seasonality strength and several autocorrelation features of the streamflow time series were found to be more regionalizable than others.
    SRMD: Sparse Random Mode Decomposition. (arXiv:2204.06108v1 [eess.SP])
    Signal decomposition and multiscale signal analysis provide many useful tools for time-frequency analysis. We proposed a random feature method for analyzing time-series data by constructing a sparse approximation to the spectrogram. The randomization is both in the time window locations and the frequency sampling, which lowers the overall sampling and computational cost. The sparsification of the spectrogram leads to a sharp separation between time-frequency clusters which makes it easier to identify intrinsic modes, and thus leads to a new data-driven mode decomposition. The applications include signal representation, outlier removal, and mode decomposition. On the benchmark tests, we show that our approach outperforms other state-of-the-art decomposition methods.
    GenIE: Generative Information Extraction. (arXiv:2112.08340v3 [cs.CL] UPDATED)
    Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code, data and models available at https://github.com/epfl-dlab/GenIE.
    Estimators of Entropy and Information via Inference in Probabilistic Models. (arXiv:2202.12363v2 [stat.ML] UPDATED)
    Estimating information-theoretic quantities such as entropy and mutual information is central to many problems in statistics and machine learning, but challenging in high dimensions. This paper presents estimators of entropy via inference (EEVI), which deliver upper and lower bounds on many information quantities for arbitrary variables in a probabilistic generative model. These estimators use importance sampling with proposal distribution families that include amortized variational inference and sequential Monte Carlo, which can be tailored to the target model and used to squeeze true information values with high accuracy. We present several theoretical properties of EEVI and demonstrate scalability and efficacy on two problems from the medical domain: (i) in an expert system for diagnosing liver disorders, we rank medical tests according to how informative they are about latent diseases, given a pattern of observed symptoms and patient attributes; and (ii) in a differential equation model of carbohydrate metabolism, we find optimal times to take blood glucose measurements that maximize information about a diabetic patient's insulin sensitivity, given their meal and medication schedule.
  • Open

    How flat is a normal mixture on top?
    Male and female heights both have a standard deviation of about 3 inches, with means of 70 inches and 64 inches. That’s a good first-pass model using round numbers. If you ask what the height of an average adult is, not specifying male or female, you get a mixture of two normal distributions. If we […] How flat is a normal mixture on top? first appeared on John D. Cook.  ( 2 min )
  • Open

    Self-Supervised MultiModal Versatile Networks
    Highlights Introduce the notion of a MultiModal Versatile (MMV) network, that can ingest multiple modalities and outputs common representations useful for downstream tasks; The paper especially studies how best to combine the modalities, so that the common representations respect properties deemed useful by the authors. Introduce the process of deflation, to efficiently apply networks on video data to static images.  ( 3 min )

  • Open

    Best sample text for voice synthesis? [D]
    I'm planning to create a clone of my own voice. Is there some kind of ideal sample text to record? I need 300 sentences. submitted by /u/headwar [link] [comments]
    [R] ML model to generate paths (lines) for a given image
    I am a researcher working on creating paths to indicate the primary and secondary neuronal connections in Corneal Confocal Microscopy images. The ground truth I have is images and sets of two-dimensional lines that indicate the primary and secondary paths as indicated in the image below (Ignore the green dots). The secondary and primary paths are always connected to each other. ​ https://preview.redd.it/r025qcpjddt81.jpg?width=834&format=pjpg&auto=webp&s=11a641988e0c6f8c1e17290fe9588f89ef530635 I am looking to find the most appropriate models to use for this task. The first thing that came to my mind was semantic segmentation. However, I am looking for other approaches that can be more suitable for drawing 1-pixel lines especially since the ground truth paths are indicated as 1-pixel-wide lines (1-pixel thickness) but the connections in the images have wider thicknesses. Any ideas for architecture or methods? submitted by /u/madr3z [link] [comments]  ( 2 min )
    [N] Followup response from BAAI on "A Roadmap for Big Model"
    Source: https://www.baai.ac.cn/portal/article/index/cid/4/id/404.html Statement on the Alleged Plagiarism by “A Roadmap for Big Model” It has come to our attention that the survey report “A Roadmap for Big Model” uploaded on arXiv by a BAAI team is suspected of plagiarism. Immediately upon learning of the allegations, an internal investigation was organized to confirm the issue. BAAI is also initiating an independent review by third-party experts to further asses the issue and accountabilities. As a research institution that attaches great importance to academic standards, BAAI holds a zero-tolerance policy towards academic misconduct. We express our sincerest apologies to the authors of the original papers and to all of those affected. The report in question constitutes a collection …  ( 2 min )
    [P] Image Restoration Using Swin Transformer in JavaScript
    Important note: Right now, the model only supports up sampling from any dimension to at most 256 pixels. I'll likely fix this restriction in the next few days. A few days back, I was searching for AI-based image up sampling models in for use within an offline JavaScript app. The latest approaches, such as SwinIR were unavailable for Javascript, so I just created a notebook that converts the SwinIR model from torch to TFJS in a relatively short kaggle kernel. I believe other transformer architectures can also be ported to JS like this. This is the link to the original paper of SwinIR. It requires around 1 GB of RAM to run. The size of model folder is 44 MB. It is quantized to float16. Anyway, hope someone will find this useful for their website or some other app. submitted by /u/Deep-Station-1746 [link] [comments]  ( 1 min )
    [D] Replacing 3x3 convolutions with two 2x2 convolutions
    Something that's always puzzled me is the ubiquitousness of 3x3 convolutions in computer vision. If I recall past discussion accurately, the main benefits of odd-sized kernels are that With the proper padding they maintain the width and height of their inputs, which makes it easier to think about/design neural network architectures. This is not possible with even kernels, unless you swallow the bullet and use asymmetric padding (which is rejected due to aesthetic reasons) Output pixels have a 1-to-1 mapping with input pixels (since odd-sized kernels have a proper "center"). This is considered a nice property -- perhaps (for instance) avoiding aliasing issues during segmentation tasks. Given these two points, we use 3x3 convolutions since they're the smallest odd-sized filter (exclud…  ( 6 min )
    [R] Do Deep Neural Networks Contribute to Multivariate Time Series Anomaly Detection ?
    submitted by /u/MVTS_Ano [link] [comments]  ( 1 min )
    [D] Open problem in modern RL that doesn't need a massive computational resources
    What are open and/or interesting problems in modern reinforcement learning that can be tackled by the average PhD/PostDoc who doesn't have access to a massive compute cluster? The problem shouldn't need us to train our model for 10 months like OpenAI's Dota2 model. Please share your thoughts. submitted by /u/ginger_beer_m [link] [comments]  ( 1 min )
    [D] What are the biggest developments in CV in last 5 years?
    I've been helping a friend of mine learn about CV, but my knowledge starts getting spotty around 2017-2018. In this spirit I'm hoping to discuss the biggest developments in CV in the last 5 years. I know that ViTs have been developed in that time but I'm hoping to fill in my knowledge gaps! A list of topics, important papers, big ideas, or anything else is much appreciated! ​ Edit: Thank you for the discussion everyone 🙌! I've been in and out of meetings but am reading through all responses now submitted by /u/SleekEagle [link] [comments]  ( 3 min )
    [P] How and where do you serve your model? Using kubernetes, docker, metal? Self developed or existing tools?
    Hi, I’m a machine learning platform engineer. I’ve been using, exploring and developing model deployment tools and platform for several years. Very often, I found that many of the tools or managed service of AI platform, are not very welcome by many users. Some think these tools are unnecessarily complicated. I'm currently developing a library in my free time trying to fill the gap. And I also want the library to get well integrated with most users' deployment environments. Would you like to share how and where do you serve your model? Using kubernetes? Self developed or existing tools? Thanks~ P.S. If you are interested, you can visit my project to submit an issue/PR or join the discussions, welcome to help: Pinferencia View Poll submitted by /u/Remote_Cancel_7977 [link] [comments]  ( 1 min )
    [P] I created a YouTube Thumbnail Dataset, and need some insight
    Hi guys! I recently created & published a dataset of YouTube video thumbnails on Kaggle (YouTube Thumbnail Dataset), I've tried to make the dataset as diverse as possible, It contains thumbnails from all varieties of YouTube channels. This dataset goes hand in hand with another dataset (containing YouTube video annotations) that I created, namely YouTubers Saying Things. The dataset contains 91 unique YouTube channels, and 10 categories, these categories are assigned by me manually to these channels. (Comedy, Science, Automobile, VideoGames, Food, Entertainment, Informative, Blog, News, Tech) All kinds of feedback and criticism are welcome, and also if you guys want some particular channel to be included in both these datasets, feel free to comment on this post, or raise an issue on the Github repositories for both these datasets, I will surely add them in the next version. Links to the datasets: YouTubers Saying Things Kaggle, Github YouTube Thumbnail Dataset Kaggle, Github submitted by /u/alcatraz2217 [link] [comments]  ( 1 min )
    Improve XGboost classification algorithm with small dataset, based on similar bigger dataset ? [D]
    Hi, I am doing researches about transfer learning for XGboost. I am currently working with a small dataset from a company in Spain (short history) and the scoring is poor. I have worked before with the same company in France and the scoring was great as I had plenty of data thanks to a big history. How could I improve my score with the data from Spain with the help of data from France ? Could I use transfer learning, or data mutualization, or data augmentation ? If anyone has faced before a similar problem, or has read some papers about it, I would love to hear about it. Thank you! submitted by /u/Cutset [link] [comments]  ( 1 min )
    [R] A Modern Self-Referential Weight Matrix That Learns to Modify Itself
    submitted by /u/hardmaru [link] [comments]
    [D] What are some interesting hidden stuff about CNNs?
    Hey all, Im trying to get up to date with Deep Learning literature, so the last week I was going through CNNs. Here's a general view of what Ive learned so far. Large filters suck, you can get better accuracy with smaller filters and more non linearities Depth is the most important for CNNs over width or filter sizes. ReLU activation is generally better, as Sigmoid/tanh gradients tend to fall off towards the ends Convolution layers are only translation invariant. Stacking multiple features together and passing them through MaxPool helps rotational invariance and scaling although not completely Residual connection help address vanishing gradient and help improve the overall training procedure Inception models worked well, as they mixed different filter sizes together helping the model learn diverse features Most current work is with Transformers, although Im not sure why. ConvNext shows similar performance can be achieved through large CNNs Do add to this if I missed anything, or if there's anything you don't know about submitted by /u/Bibbidi_Babbidi_Boo [link] [comments]  ( 5 min )
    [D] How to create scenes with text - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors, a 5-minute paper summary by Casual GAN Papers
    The authors of Make-A-Scene propose a novel text-to-image method that leverages the information from an additional input condition called a “scene” in the form of segmentation tokens to improve the quality of generated images and enable scene editing, out-of-distribution prompts, and text-editing of anchor scenes. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/284 Blog post: https://www.casualganpapers.com/text-to-image-vqvae-scene-generation/Make-A-Scene-explained.html Make-A-Scene arxiv / code (by Casual GAN Papers Community) Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed
    I’m sure we all remember the story of “The Little Engine That Could.” A little railroad engine was built for pulling a few cars on and off the switches. When more powerful engines are asked to pull a load over a steep hill, they respond “I can’t; that is too much a pull for me”.… Read More »Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed The post Fallacy of Becoming Data-driven – Part 1: Becoming Value-obsessed appeared first on Data Science Central.  ( 6 min )
    DSC Weekly Digest 4/12/2022: Demographics Drives Analytics
    The Los Angeles Times recently reported on a growing problem not just for California School Districts, but across much of the Northern Hemisphere: The number of children entering school has been dropping steadily for five years now, and is changing the dynamics of education. What’s worse, those declines are accelerating. Sometimes understanding the future comes… Read More »DSC Weekly Digest 4/12/2022: Demographics Drives Analytics The post DSC Weekly Digest 4/12/2022: Demographics Drives Analytics appeared first on Data Science Central.  ( 6 min )
  • Open

    Control access to Amazon SageMaker Feature Store offline using AWS Lake Formation
    You can establish feature stores to provide a central repository for machine learning (ML) features that can be shared with data science teams across your organization for training, batch scoring, and real-time inference. Data science teams can reuse features stored in the central repository, avoiding the need to reengineer feature pipelines for different projects and […]  ( 10 min )
    Manage dialog to elicit Amazon Lex slots in Amazon Connect contact flows
    Amazon Lex can add powerful automation to contact center solutions, so you can enable self-service via interactive voice response (IVR) interactions or route calls to the appropriate agent based on caller input. These capabilities can increase customer satisfaction by streamlining the user experience, and improve containment rates in the contact center. In both the self-service […]  ( 6 min )
  • Open

    AI Trippy Dream 35 - Psychedelic Special Request
    submitted by /u/LordPewPew777 [link] [comments]
    Ohio State University Researchers Develop SAT2LoD2: An Open-Source Python Tool For 3D Landscape Modelling Using Satelite Imagery
    3D landscape modeling has seen a rise in its popularity and applications in recent years. It has countless applications in the fields of civil engineering, earth sciences, military applications, and many others. Geometric 3D models are typically developed using the city geography markup language (CityGML), and the Level-of-Detail (LoD) building model is the preferred model for building 3D models using CityGML. The use of Satellite imagery for landscape modeling provides the advantage of covering a wide area and is low cost. However, developing LoD2 models using satellite imagery remains a big challenge. Building models in such a way involves complex steps demanding heuristics-based approaches and ML-based detection paradigms. In a recent paper, researchers at the Ohio State University propose a SAT2LoD2 to facilitate the development of 3D landscape models. SAT2LoD2 is an open-source, python-based GUI-enabled software that takes the satellite images as inputs and returns LoD2 building models as outputs. The software also has the feature of taking road networks and custom maps as additional inputs for better results. Continue Reading Paper: https://arxiv.org/pdf/2204.04139v1.pdf Github: https://github.com/gdaosu/lod2buildingmodel submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Are there AIs which are able to simulate a human body when you shoot/hit it, that you can use for video games?
    submitted by /u/TheblackRook3 [link] [comments]  ( 1 min )
    Digital Folktales, a collection of short stories about internet folklore, written and illustrated by Artificial Intelligence
    submitted by /u/fabianmosele [link] [comments]  ( 1 min )
    https://youtu.be/0x0to1wNh6s
    https://youtu.be/0x0to1wNh6s A new enterprise project model supported by AI. In the near future, the growing introduction of automation and artificial intelligence will require the updating of most of the activities in the production world, along with changes to contracts, tasks, and integration processes between man and machine. This is supported by the Accenture "IT's Learning" study, according to which 81% of jobs will suffer the impact of AI and robotization. submitted by /u/neologos52 [link] [comments]  ( 1 min )
    How to improve your video editing software with AI?
    submitted by /u/tah_zem [link] [comments]
    Top Ethical Challenges in AI – The Price of Progress
    What does 2022 look like for AI? Let's find out. https://us.sganalytics.com/blog/top-ethical-challenges-in-ai-the-price-of-progress/ submitted by /u/JencyJane [link] [comments]
    Bias in Artificial Intelligence: Is Diversity the Key to the Future Of AI?
    submitted by /u/JencyJane [link] [comments]
    Wrote about KNN — Introduction to DataScience Book
    submitted by /u/mindaslab [link] [comments]
    How to create scenes with text - Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors, a 5-minute paper summary by Casual GAN Papers
    The authors of Make-A-Scene propose a novel text-to-image method that leverages the information from an additional input condition called a “scene” in the form of segmentation tokens to improve the quality of generated images and enable scene editing, out-of-distribution prompts, and text-editing of anchor scenes. As for the details, let’s dive in, shall we? Full summary: https://t.me/casual_gan/284 Blog post: https://www.casualganpapers.com/text-to-image-vqvae-scene-generation/Make-A-Scene-explained.html Make-A-Scene arxiv / code (by Casual GAN Papers Community) Join the discord community and follow on Twitter for weekly AI paper summaries! submitted by /u/KirillTheMunchKing [link] [comments]  ( 1 min )
  • Open

    Measuring Goodhart’s Law
    Goodhart’s law famously says: “When a measure becomes a target, it ceases to be a good measure.” Although originally from economics, it’s something we have to grapple with at OpenAI when figuring out how to optimize objectives that are difficult or costly to measure.  ( 4 min )
  • Open

    MIT Schwarzman College of Computing unveils Break Through Tech AI
    New program strives to bridge the talent gap for underrepresented groups in the tech industry.  ( 5 min )
    Engineers enlist AI to help scale up advanced solar cell manufacturing
    Perovskite materials would be superior to silicon in PV cells, but manufacturing such cells at scale is a huge hurdle. Machine learning can help.  ( 7 min )
  • Open

    Simple and Effective Zero-Shot Task-Oriented Dialogue
    Posted by Jeffrey Zhao and Raghav Gupta, Software Engineers, Google Research Modern conversational agents need to integrate with an ever-increasing number of services to perform a wide variety of tasks, from booking flights and finding restaurants, to playing music and telling jokes. Adding this functionality can be difficult — for each new task, one needs to collect new data and retrain the models that power the conversational agent. This is because most task-oriented dialogue (TOD) models are trained on a single task-specific ontology. An ontology is generally represented as a list of possible user intents (e.g., if the user wants to book a flight, if the user wants to play some music, etc.) and possible parameter slots to extract from the conversation (e.g., the date of the flight, the…  ( 8 min )
  • Open

    Hilbert transform and Fourier series
    A few days ago I wrote about the Hilbert transform and gave as an example that the Hilbert transform of sine is cosine. We’ll bootstrap that example to find the Hilbert transform of any periodic function from its Fourier series. The Hilbert transform of a function f(t) is a function fH(x) defined by where the […] Hilbert transform and Fourier series first appeared on John D. Cook.  ( 2 min )
    Logarithms yearning to be free
    I got an evaluation copy of The Best Writing on Mathematics 2021 yesterday. One article jumped out as I was skimming the table of contents: A Zeroth Power Is Often a Logarithm Yearning to Be Free by Sanjoy Mahajan. Great title. There are quite a few theorems involving powers that have an exceptional case that […] Logarithms yearning to be free first appeared on John D. Cook.  ( 2 min )
  • Open

    Is the number of dimensions in the latent space equal to the number of the neurons of the layer? Or perhaps number of neurons in the whole neural network?
    ​ https://preview.redd.it/qtgp7dzx2bt81.png?width=850&format=png&auto=webp&s=ccf391e8a1613d5405c137296bdf853010fc3f19 Not speaking specifically about autoencoders here, but about general neural networks. As I understand correctly, "latent space" refers to one of the fully connected layers of the network and the dimensionality of the space is equal to the number of the neurons in this layer. This would mean, that each of the layers has a different "latent space" representation of the learned data distribution. Do I understand it correctly? I got really confused because people seem to sometimes refer to latent space as to all of the possible activations of all of the neurons in the network (each neuron of the network is one dimension of a latent space) OR EVEN to all of the PARAMETERS of the network (each parameter is one dimension of the latent space (??)). Do we have a separate name for these? How do we call the parameter space of a neural network? Is my original intuition even correct? submitted by /u/bzqp2 [link] [comments]  ( 1 min )
    Researchers Propose a Novel Framework ‘LilNetX’ For Training Deep Neural Network With Extreme Model Compression, and Structured Sparsification
    In this research, the researchers from the paper ‘ LilNetX: Lightweight Networks with EXtreme Model Compression and Structured Sparsification’ talk about the importance of larger parameter-heavy and computationally costly architectures in deep neural networks (DNNs) and how it improves the computer vision tasks. They also mentioned in the paper that it is not as simple as it seems since, as the DNNs become more common in the business, they are frequently required to be trained multiple times, communicated across the network to various devices, and executed under hardware limits with minimum loss of accuracy, all while maintaining accuracy. Then the question arises of how to reduce the models’ size on the devices while still enhancing their run-time. Explorations in this field have tended to take one of two paths: lowering model size via compression approaches or reducing computing demands through model pruning. The main achievement of this research from the University of Maryland and Google Research is the introduction of ‘LilNetX’, an end-to-end trainable neural network technique that allows learning models with specified accuracy-rate-computation trade-offs. Prior work has taken a piecemeal approach to these difficulties, which necessitates post-processing or multistage training, which is not efficient and does not scale well for big datasets or architectures. To encourage modest model size, the strategy is to create a joint training goal that penalizes the self-information of network parameters in a reparameterized latent space while simultaneously incorporating priors to increase structured sparsity in the parameter space to decrease computation. Continue Reading Paper: https://arxiv.org/pdf/2204.02965.pdf Github: https://github.com/Sharath-girish/LilNetX submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    What would happen if you connect inputs and outputs randomly to a large hebbian Spiking NN and let it learn shape itself in an environment.
    submitted by /u/The_impact_theory [link] [comments]  ( 2 min )
    noob here who doesn’t really understand calculus
    If i want to take the partial derivative of the error with respect to a certain weight, it would be similar to taking the derivative of say y = value * weight + bias but if i hold the value and bias still, the derivative just becomes the value of the weight, like how the derivative of y = 3x is just 3… so what do I do? it doesn’t make sense to multiply 3 by a learning variable and make that the new weight, so what am I missing? submitted by /u/-i-hate-this-place- [link] [comments]  ( 2 min )
  • Open

    Does the reward in reinforcement learning have to be immediate reward?
    I'm trying to train a seq2seq model that generates a sentence with T words using reinforcement learning. The input and all the previously generated words form the state of the environment, and generating a word in the sentence is considered an action. In the previous methods [1, 2], the immediate reward r(s_t, a_t, s_{t+1}) for the t-th action a_t is 0 when t < T, and the reward is the CIDEr score (a scalar that measures the quality of the sentence) of the entire sentence when t = T. The policy is updated after the entire sentence is generated. I designed a new reward for each action, and the new reward for the t-th action is not zero when t < T. However, the reward of each action can only be calculated when the entire sentence is generated since it relies on the CIDEr score of the entire sentence, i.e. the reward for all the actions relies on the final state s_T. Can I still define the reward in the form of r(s_t, a_t, s_{t+1}) ? [1] Rennie et al. Self-critical sequence training for image captioning, CVPR 2017: 7008-7024. [2] Ranzato et al. Sequence level training with recurrent neural networks, ICLR 2016. submitted by /u/entalent [link] [comments]  ( 1 min )
    Question about math
    I am reading that paper A Distributional Perspective on Reinforcement Learning, and it is related to measure theory. Is it worth to spend time to study whole real analysis and measure theory? submitted by /u/Professional_Card176 [link] [comments]  ( 1 min )
    What does an oscillating explained_variance signify during training? (PPO)
    submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    SB3- HER+DQN for my simple discrete map env but the training result is pretty bad
    Hi all, I am creating a multiple-goal environment. Which is an 8*8 discrete map with a start and terminal state (only one) change after each episode. The reward is 100 for reaching the terminal state and -1 for the rest. In fact, I am not sure if the reward is reasonable. I used PPO from SB3 and I can easily finish it. But when I go offline, using HER+DQN, the training is very bad. Feel free to run it here or take a look at the env and training result. Thank you so much! https://colab.research.google.com/drive/1Mt5Yje7GTyjOBHL09zC9C1L05xpTAK9v?usp=sharing submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    What would happen if you connect inputs and outputs randomly to a large hebbian Spiking NN and let it learn shape itself in an environment.
    submitted by /u/The_impact_theory [link] [comments]  ( 2 min )
    Number of Feature VS Action Space in Multi-agent Reinforcement Learning
    Hi All, I am working on a MARL fintech project where I use DDQN and for Q-value, I use LSTM bacause it is time series data. This is a project overview. It has 7 features which is derivatives of ask and bid price and has 12 action spaces for action taking. Is it possible to generate a good reliable model using only 7 features for 12 action spaces? Number of feature or quality of feature is important for taking good decision in RL. Open for Suggestion #Reinforcement_Learning #MARL submitted by /u/laxuu [link] [comments]  ( 1 min )
    Is there anyone interested in re-implementing APT?
    Hi, these day I really interested in self-supervised RL. Especially only based on state novelty. So I wanted to re-implement APT(Behavior From the Void: Unsupervised Active Pre-Training). but my re-implementation showed not meaningful behaviors compared to official implementation. official implementation uses drq-v2 and intrinsic curiosity module. So, I want to re-implement APT as described in paper(using drq-v1 and contrastive learning). Is there someone to check my reimplementation? https://github.com/seolhokim/apt In that repository, DrQ-v1 works well, but only apt doesn't work! I can't understand why agent stop moving when pre-training. ​ Really thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
  • Open

    The Voice Synthesis Business: 2022 Update
    In the past few years, high-quality automated text-to-speech synthesis has effectively become a commodity, with easy access to cloud-based… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 15 min )
  • Open

    Massaging Data using Pandas
    When we talk about managing data, it is quite inevitable to see data presented in tables. With column header, and […] The post Massaging Data using Pandas appeared first on Machine Learning Mastery.  ( 25 min )
  • Open

    MLCommons’ David Kanter, NVIDIA’s Daniel Galvez on Improving AI with Publicly Accessible Datasets
    In deep learning and machine learning, having a large enough dataset is key to training a system and getting it to produce results. So what does a ML researcher do when there just isn’t enough publicly accessible data? Enter the MLCommons Association, a global engineering consortium with the aim of making ML better for everyone. Read article > The post MLCommons’ David Kanter, NVIDIA’s Daniel Galvez on Improving AI with Publicly Accessible Datasets appeared first on NVIDIA Blog.  ( 2 min )
  • Open

    Just Tech: Centering Community-Driven Innovation at the Margins Episode 3 with Dr. Sasha Costanza-Chock
    Episode 135 | April 13, 2022 In “Just Tech: Centering Community-Driven Innovation at the Margins,” Senior Principal Researcher Mary L. Gray explores how technology and community intertwine and the role technology can play in supporting community-driven innovation and community-based organizations. Dr. Gray and her team are working to bring computer science, engineering, social science, and […] The post Just Tech: Centering Community-Driven Innovation at the Margins Episode 3 with Dr. Sasha Costanza-Chock appeared first on Microsoft Research.  ( 31 min )

  • Open

    How IoT Uses Machine Learning To Change The World
    IoT and Machine Learning are the most advanced and evolving technologies that continue to rise in today’s modern world, simplifying human efforts and making lives easier. These technologies have proved to streamline operations and workflows for various industries and provide more robust and scalable applications that allow users to make things done seamlessly.  In recent… Read More »How IoT Uses Machine Learning To Change The World The post How IoT Uses Machine Learning To Change The World appeared first on Data Science Central.  ( 6 min )
    Drag-and-drop Data Pipelining: The Next Disruptor in ML
    Recent advances in machine learning (ML) and artificial intelligence (AI) technologies are helping enterprises across industries quickly move from their use cases from the pilot stage to production and operationalization. According to a report by McKinsey & Company, by 2030, businesses that fully absorb AI could double their cash flow, while companies that don’t could… Read More »Drag-and-drop Data Pipelining: The Next Disruptor in ML The post Drag-and-drop Data Pipelining: The Next Disruptor in ML appeared first on Data Science Central.  ( 3 min )
    Advances Highlight the Future of IoT Security
    As the Internet of Things (IoT) is gradually moving from being a centralized structure to a more complex network of innumerable decentralized smart devices, the need for security of data will be acknowledged to a greater degree, thereby promoting the expansion of the global IoT security market. The larger the volume of the data transferred… Read More »Advances Highlight the Future of IoT Security The post Advances Highlight the Future of IoT Security appeared first on Data Science Central.  ( 3 min )
  • Open

    How to Win a Kaggle Competition with Bayesian Optimization
    submitted by /u/aidev2040 [link] [comments]
    The Moment A Neural Net Became Sentient For The First Time - AI Art Story [4K] #shorts
    submitted by /u/fooo-ooo [link] [comments]
  • Open

    How to Win a Kaggle Competition with Bayesian Optimization
    submitted by /u/aidev2040 [link] [comments]
    Should I use A Encoder Decoder CNN
    I'm trying to make a model to play a car racing simulator. I have a dataset with the inputs used(human) to get fast lap times. I would like to make a model that reads the game video output and predicts the arrow key inputs to get a fast lap time. It seems, to me, that a CNN with encoder-decoder layers trained on the keyboard inputs would work. Is this a good architecture? I'm also having a hard time finding useful literature. please let me know if there is anything I should look into or do differently. submitted by /u/newroadkill [link] [comments]  ( 1 min )
    Top Trends & Predictions That Will Drive Data Science, AI and Machine Learning in 2022
    submitted by /u/saik2363 [link] [comments]  ( 1 min )
    Last Week in AI: OpenAI DALL-E 2 generates amazing images, Google's 540 billion parameters language model, Clearview AI branches out beyond police, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    Top Trends & Predictions That Will Drive Data Science in 2022
    submitted by /u/saik2363 [link] [comments]
    Conversation about the future, life and AGI
    submitted by /u/HumanSeeing [link] [comments]  ( 1 min )
    AI predicts if and when someone will experience cardiac arrest
    submitted by /u/qptbook [link] [comments]
    The last Woolly Mammoth on Earth
    submitted by /u/Ok-Passion-6574 [link] [comments]
    The last Woolly Mammoth on Earth
    Is it good or bad? Also, I was wondering what art goes big as an NFT? submitted by /u/Ok-Passion-6574 [link] [comments]
    Stanford Researchers Introduced a Novel Deep Learning Computer-Assisted System for Real-Time Open Surgery and AVOS (the Annotation Videos of Open Surgery) Dataset
    In recent years, the rise of Deep Learning has continuously brought innovations to many fields, and the medical domain is one of them. AI applications in this field are countless: from pre-operative diagnosis to disease classification, from skill assessment to post-operative rehabilitation. Among them, systems to assess surgical skills and provide feedback to improve technique could help in decreasing the number of complications in surgical procedures, which are still the third leading cause of death globally. AI can be an additional coach for surgical trainees and an expert colleague for experienced surgeons. But, to train an AI system, reliable data are fundamental. The more utilized type of data in this context is undoubtedly video streams, as a camera is less invasive than other types of sensors, such as ArmBand or EEG, which could weigh on the surgeon’s performance given their physical bulk. This applies particularly to laparoscopic surgery, where an in-body fiber-optic camera is used to visualize the operating area and facilitate rapid data collection. For this reason, the majority of computer-assisted systems focus on laparoscopic surgery. Continue Reading Paper: https://arxiv.org/pdf/2112.07219.pdf https://preview.redd.it/84mwdw81l4t81.png?width=741&format=png&auto=webp&s=636aff067560876d14f37caccb83bd951e991c68 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Machine Learning vs. Cookie Consent Systems
    submitted by /u/DaveBowman1975 [link] [comments]
    Artificial Nightmares: Stone Golem Ruins || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    My epiphany on synthetic media five years later, and what I feel is coming within the next five years
    Roughly five years ago, I created this thread where I outlined my realization about the imminency of synthetic media. This was before transformers blew up, before StyleGAN, before GPT-2, when WaveNet and DeepDream were still among the best we could do, and when predictive text algorithms that were barely better than Markov Chains were still the state of the art. In five short years, the state of artificial intelligence has changed overwhelmingly, to the point it's barely recognizable. Looking back to 2017, I now get this sense of everything feeling so primitive and fake. I've stated many times that AI before roughly 2019 was a bunch of digital magic tricks, and the field as a whole was essentially a giant Potemkin village that utilized clever sleight of hand and advertising to make it se…  ( 7 min )
  • Open

    [N] Substantial plagiarism in BAAI’s “a Road Map for Big Models”
    BAAI recently released a two hundred page position paper about large transformer models which contains sections that are plagiarized by over a dozen other papers. In a massive fit of irony, this was found by Nicholas Carlini, a research who (among other things) is famous for studying how language models copy outputs from their training data. Read the blog post here submitted by /u/StellaAthena [link] [comments]  ( 2 min )
    [D] What's the status of live speech-to-speech conversions? (not TTS)
    I've been trying to find information about the subject, but almost every result is TTS, and the only example of what I actually want (Respeecher) costs 2 grand a year. Are there any (preferably open-source) other alternatives? submitted by /u/UncertainOutcome [link] [comments]  ( 1 min )
    Comparison of workshops at major conferences. [D]
    I understand that workshop quality is dependent more on the workshop itself rather than the host conference. But, in general, how are workshops from CVPR, NeurIPS, ICLR, ICML, etc. viewed by the community in relation to one another? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [Project] Leniax - A Lenia simulation library powered by JAX
    Hi everyone! I'm really happy to finally publish the work I've been doing on the Cellular Automata called Lenia. It is a JAX library called Leniax and allows one to simulate thousands of simulations in parallel using CPU, GPU, or TPU. With it, you can: Simulate Conway's Game of Life Simulate multiple Lenia simulations in parallel Use gradient descent to search for Continuous CA parameters Launch a QD search to discover a ton of diversity in Lenia. Check out the blog post for some visual results The main goal of this work was to advance the state of automatic discovery for those systems. 10 months ago, I bet on QD to do so, turns out it indeed works! QD algorithms really rock! The code is completely open-source with all the examples, notebooks, and even experiments I ran. (See the doc for more links) I would love to have feedback on this and of course, if you find that subject interesting, engage with our community! Cheers! submitted by /u/morgangiraud [link] [comments]  ( 2 min )
    [D] Effective Image Pre-Processing Techniques for Enhancing Defects in an Image?
    So I am doing some object detection on Pavement Defects. I've already collected the data with some annotations but the model is performing rather poorly. For example, the `maP` is about `0.12` for the whole data. By examining the data, I think one of the reasons is that some of the defects such as cracks, or faded pavement markings are not so clear and either casted by a shadow or too bright from the sun. Image Example #1 Or from motion blur Image Example #2 Is there any image preprocessing technique aside maybe from CLAHE that could be applied? Moreover, I am currently using YOLOv5 for this. submitted by /u/sarmientoj24 [link] [comments]  ( 3 min )
    [P] the copent package v0.2.3 available on PyPI now
    The copent package implements the method for estimating copula entropy (mutual information) and transfer entropy (conditional mutual information / conditional independence). This version add a new feature (an argument 'mode') for dealing with large data when memory is limited. Github: https://github.com/majianthu/pycopent PyPI: https://pypi.org/project/copent/ any comments are welcome. submitted by /u/majianthu [link] [comments]  ( 1 min )
    [D] Feedback on a worked Continuous Deployment Example (CI/CD/CT)
    Hey everyone! At ZenML, we released today an integration that allows users to train and deploy models from pipelines in a simple way. I wanted to ask the community here whether the example we showcased makes sense in a real-world setting: Context ZenML is an extensible, open-source MLOps framework to create production-ready machine learning pipelines. Built for data scientists, it has a simple, flexible syntax, is cloud- and tool-agnostic, and has interfaces/abstractions that are catered towards ML workflows. Seldon Core is a production grade open source model serving platform. It packs a wide range of features built around deploying models to REST/GRPC microservices that include monitoring and logging, model explainers, outlier detectors and various continuous deployment strategies such…  ( 2 min )
    [D] Can we decrease the training time of a deep learning model by using a domain specific pretrained backbone instead of the standard imagenet?
    I am working in the retail domain atm, and train a lot of image classifiers. I have always used imagenet as pretrained to train my model upon. I thought it would be straightforward to train a backbone on a big retail dataset(1000+ classes), and then use that as pretrained and it'll reduce the time it takes for my models to generalize. Turns out, the model took more epochs to train when using the retail backbone, then the imagenet one. Isn't this counter-intuitive? What else can I do to make by backbone better? submitted by /u/lMAObigZEDONG [link] [comments]  ( 1 min )
    [D] Removing Unpredictable Samples from a Training Set
    Hi, I have a fairly interesting project that I am working on. I have a model that has some samples which are completely unpredictable, random noise, and some that are reliably predictable. How would you go about separating out the samples which can be predicted, identifying them going forward, and retraining on a cleaned dataset with only those samples? Interested to see someone else's approach to this. Edit: I forgot to mention that my data is from an embedding matrix from ordinal categorical features. submitted by /u/Katapilla_Killa [link] [comments]  ( 2 min )
    [D] What's your experience with Model-Agnostic Meta-Learning in RL?
    There is the original paper and there was a subsequent paper by other authors titled: On the Convergence Theory of Debiased Model-Agnostic Meta_Reinforcement Learning I've been working on implementing the latter paper on the HalfCheetah environment. However, my attempts have been unsuccessful so far (I know the authors provided the code, but I am trying to write my own code to check my understanding). I'd like to know any tips/tricks that anyone can share and just to know about people's experiences, especially using MAML for RL. submitted by /u/carlml [link] [comments]  ( 1 min )
    How to deal with the fact that whatever idea I have has already been published.
    I'm always having these ideas for projects and papers, then I look around a bit, and I find someone that has already studied that idea and published it. It's genuinely annoying, It's been 6 months now, and all the papers are newly published (2021 mostly) so It's even more annoying. How do you deal with that ? and How do you find a niche that no one is touching. I just started a PhD, so It's really stressing me out. I feel like I'll never be able to advance on my thesis, and that I should just quit, because better work has already been done. submitted by /u/AlanRoofies [link] [comments]  ( 4 min )
    [P] Faster version of cv2.BFMatcher(cv2.NORM_L2) optimized for keypoints matching
    Hi there, in the case if any of you use the openCV BFMatcher with NORM_L2, you can try to use my recent pet project: https://github.com/kmkolasinski/fast-bfmatcher Basically the speed-up is achieved by using faster replacement for BLAS, a BLIS library and some custom implementations written in C and cython. submitted by /u/kmkolasinski [link] [comments]  ( 1 min )
    [D] Are there any comparison studies on learning rate schedules for generative transformers?
    My current research heavily involves generative vision transformers and after some experimentation it seems like the choice of a LR scheduler is a crucial factor for proper convergence. Does anyone know of any comparison studies done recently that explore various types of schedulers for generative tasks? submitted by /u/Megixist [link] [comments]  ( 4 min )
    [N] Fine-Tuning LayoutLM v2 For Invoice Recognition
    With the advent of deep learning models, automated data extraction is becoming more accessible. In this article, we demonstrate step-by-step how to fine-tune layoutLM V2 on invoices starting from data annotation to model training and inference. Enjoy the read and if you have any questions, leave them below. submitted by /u/UBIAI [link] [comments]  ( 1 min )
  • Open

    Lidar-Camera Deep Fusion for Multi-Modal 3D Detection
    Posted by Yingwei Li, Student Researcher, Google Cloud and Adams Wei Yu, Research Scientist, Google Research, Brain Team LiDAR and visual cameras are two types of complementary sensors used for 3D object detection in autonomous vehicles and robots. LiDAR, which is a remote sensing technique that uses light in the form of a pulsed laser to measure ranges, provides low-resolution shape and depth information, while cameras provide high-resolution shape and texture information. While the features captured by LiDAR and cameras should be merged together to provide optimal 3D object detection, it turns out that most state-of-the-art 3D object detectors use LiDAR as the only input. The main reason is that to develop robust 3D object detection models, most methods need to augment and transform th…  ( 8 min )
  • Open

    Very Deep Neural Networks Explained in 40 Seconds
    By Vincent Granville, Ph.D., Author at MLtechniques.com Sponsored Post Very deep neural networks (VDNN) illustrated with data animation: a 40 second […] The post Very Deep Neural Networks Explained in 40 Seconds appeared first on Machine Learning Mastery.  ( 4 min )
    Scientific Functions in NumPy and SciPy
    Python is a general-purpose computation language, but it is very welcomed in scientific computing. It can replace R and Matlab […] The post Scientific Functions in NumPy and SciPy appeared first on Machine Learning Mastery.  ( 12 min )
  • Open

    What can you tell us about him?
    (A Sci-Fi Ultrashort)  ( 5 min )
  • Open

    MIT’s FutureMakers programs help kids get their minds around — and hands on — AI
    The programs are designed to foster an understanding of how artificial intelligence technologies work, including their social implications.  ( 8 min )
  • Open

    Best GridWorld environment?
    In your opinion, what is the best gridworld environment? I want to compare different RL algorithms on it. I’m looking for something super basic: - start and goal state - some obstacles - customisable: move the start and goal state, place obstacles in different points, modify reward map etc. - computationally efficient Thank you submitted by /u/wiston_smith [link] [comments]  ( 1 min )
    Custom Callback for Max Episode Reward using Stable Baselines3 with Custom Env
    Hi all, I've built a custom gym env and am using Stable Baselines3 to train an agent. I would like to visualise in TensorBoard the maximum reward achieved for each episode. I have these values in a list in my env, and I am trying to create a custom Callback to plot this in TensorBoard but it's not working. I've looked over the documentation and other forums but can't figure out how to do this. Can anyone help me out? 🙏🏽 Thank you! submitted by /u/leozinho2r [link] [comments]  ( 1 min )
    Open-sourced NetHack 2021 NeurIPS Challenge winning agent
    Recently, we have released the source code of our winning solution for the NetHack 2021 NeurIPS Challenge: https://github.com/maciej-sypetkowski/autoascend We hope that it will help in leveraging this complex environment, that still seems to be beyond capabilities of reinforcement learning. Check out links in the README "Description" section for more context. submitted by /u/procedural_only [link] [comments]  ( 1 min )
    Project
    Do any of u have any good rl project suggestion or a complete project for college major , I have only done work on some self playing Atari , mario games , if u have any good idea please suggest 🙌 submitted by /u/stoned_egineer [link] [comments]  ( 1 min )
    Training a DQN agent for platformer game
    Does anyone have experience training agents to play platformer games like mario? I am trying to train an agent for the platformer game Jump King to get past atleast a few levels, using DQN but the agent is performing poorly after 8000 episodes of training, (one episode being the agent spawns at the start and has 15 seconds or so to jump around gaining reward) he is barely able to get past the first level most of the time :c I am using a very basic sequential network of 2 Linear layers with inputDim 4, outputDim 4, and hiddenDim 32. and because my state is not using any image data, its just (current_level, x_pos, y_pos, jumpCount) as input to the network . As for the reward, I am using the y position to give reward if the agent is getting to a new level (large reward) or making progress in the current level (curr_y > old_y), otherwise the agent gets a negative reward. Should I consider using a CNN and image data to train this agent like in the atari games paper, or is using image data and a conv net going to perform worse rather than using my current state? Should I consider combining image data with the current state, or just keeping the current non-image data state but ? Also, roughly how long should I be training the agent for? is 8000 episodes not enough? 1 episode takes roughly 7 seconds in time (it is using pygame engine and I turned off the rendering and I think that made it a little bit faster) This is my first time training an agent for a hard game like this using DQN so I would appreciate any tips/advice to improve the agent! repo: https://github.com/senweim/JumpKingAtHome submitted by /u/TernaryJimbo [link] [comments]  ( 2 min )
    Which environment impress you? (related with software architecture, API, ...)
    I want to hear about your impressive environment! Specifically, I want to make my custom environment well using various library like openAI gym. In this contxet, I find out the highway-env https://github.com/eleurent/highway-env/ ! I think this environment has convinient API for users. Thus, I make my custom env with referencing the highway-env https://preview.redd.it/ndl1h1dvpzs81.png?width=711&format=png&auto=webp&s=267320a9713d98e7e49c4bb89423e5a9612bad8e In this line, could you speak your best environment? It doesn't matter about your best env has any advantage! submitted by /u/Seungeon94 [link] [comments]  ( 1 min )

  • Open

    Strategies to deal with Large Action Spaces
    Hey guys, I tried building a PPO model for Wordle. My initial test was checking the performance of the model with just 100 words. The agent was able to learn within a few thousand epochs and had an average guess length of about 2.8 before it could correctly identify the words. However, when i extend the action space to the entire 2.3k words, the model barely learns. Even after a few 100k iterations, the mean length revolves around 5.9 (given wordle has a max of 6 attempts per game) Any suggestions on how to help the agent learn faster in large action spaces? ​ I also tried an embedding based approach, but the performance was very similar. ​ Thanks submitted by /u/altair9335 [link] [comments]  ( 1 min )
    What are your best results for ProcGen: CoinRun?
    Has anyone managed to get a consistent score of > 9 on CoinRun? I understand that some generated levels require LSTMs in order to be solvable 100% of the time, but even excluding these hard-core levels I can see some occasions where my agents are not operating with 100% effectiveness. For some reason "fully solving" CoinRun seems harder than expected. The papers on CoinRun usually just show the results after 100mm steps or so, but I am more interested in what the community has achieved with "normal setups". submitted by /u/tmuxed [link] [comments]  ( 1 min )
    How to use the same action in trained RL network, when model is retested?
    I trained RL agent using stable baseline library and gym env. When I am trying to test agent, this makes different action when I am re running again. I used the same seed in test env. for i in range(length-lags-1): action, _states = model.predict(obs_test) obs_test, rewards, dones, info = env_test When I am runnig again the above code, I am getting the different results. submitted by /u/Mariam_Dundua [link] [comments]  ( 1 min )
    Unity RL ml agents module, walker example
    Hi all, I'm trying to teach my custom fbx model to walk with the help of ppo, as in the example from ml agents. I have difficulties with the exact import and the assignment of rigidbody here, that is, the neural network is being trained, but for some reason physics does not work. Has anyone seen it, or does anyone have an example of how to train a unity custom fbx model using ml agents? Thx all! submitted by /u/IndependenceCivil576 [link] [comments]  ( 1 min )
    Implementing RL algorithm on apache spark
    I want to run RL algorithm on Apache Spark. However, RL does not exists in Spark's MLib. Is it possible to implement it? any links may help. Thank you in advance submitted by /u/fatenLouati [link] [comments]  ( 1 min )
    Is reinforcement learning being used for the development of self-driving cars
    We will introduce the general process of self-driving tasks first and then the development of Reinforcement Learning in self-driving cars. The general process of self-driving tasks includes perceiving, decision-making, planning and controlling. The tasks of perceiving have adopted deep learning and that did a good job. Being different from monitoring learning, decision intelligence AI methods, which are represented by reinforcement learning, model the environment as Markov Decision Process(MDP)to get optimization. In sequential decision problems the utility of agent's actions do not depend on single decisions, expressed with the state, which the agent would have gotten into, as the result of this decision, but rather on the whole sequence of agent's action. Here, one thing that needs t…  ( 4 min )
  • Open

    [R] Transformers replicate Hippocampal representations; notably place and grid cells in the brain
    Paper: https://arxiv.org/abs/2112.04035 Yes, the paper is cautious about comparing the model one-to-one to the brain “Note, we are not saying the brain is closely related to transformers because it learns the same neural representations, instead we are saying the relationship is close because we have shown a mathematical relationship between transformers and carefully formulated neuroscience models of the hippocampal formation.” While objections like "its just correlation/relation, its not exactly the same!!" are true to an extend, its still a very unexpected observation that, they're even remotely similar. Needless to say, Transformers were not inspired from the brain - and as more evidence collates (https://www.nature.com/articles/s42003-022-03036-1 --> Activations are linearly correlatable) it does feel mysterious; perhaps atleast some of the systems used by the brain converge on an efficient pattern discovered by our backpropogated friends... [insert 'coincidence? I think not!' meme] submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 1 min )
    [D] What is the smallest, most capable, generative language model available now?
    I'm looking for a generative-LM equivalent of an EfficientNet-Lite, for inference on devices with limited to no VRAM. I know about some popular ones like DistilGPT2. But it's been 2 years after its release. Surely, someone improved their size/performance ratio, right... right? Thank you for your time. 🤗 submitted by /u/Deep-Station-1746 [link] [comments]  ( 1 min )
    How would you rank major tech companies' research labs for prestige? [D]
    This is just for fun, not to be taken too seriously. But I'm curious what are the reputations among the community for various research divisions (specifically AIML) of major companies, ie: Google, Facebook/Meta, Microsoft, Amazon, NVIDIA, IBM, etc. My perceived (albeit naive) view is Google > Facebook > MSR are top tier. Don't know much about the others. But I've read that some people consider MSR most prestigious due to their academic environment. But I've seen that Google and FB dominate in terms of major publications, ie: vision transformers are associated with Google. submitted by /u/avd4292 [link] [comments]  ( 2 min )
    [P] Squirrel: A new OS library for fast & flexible large-scale data loading
    Hi all, Today we open-sourced Squirrel, a data infrastructure library that my colleagues and I have been working on over the past 1.5 years: https://github.com/merantix-momentum/squirrel-core We’re a team of ~30 ML engineers developing machine learning solutions for industry and research. Across all our projects, we need to load large-scale data in a fast and cost-efficient way, while keeping the flexibility to work with any possible dataset, loaded from local storage, remote data buckets or via APIs such as HuggingFace. Not finding what we were looking for, we decided to build it ourselves. Squirrel has already proven its value in our deep learning projects at Merantix Momentum and shows competitive benchmark results (check them out here). We’re super excited to share it with the OSS community and hope that you can benefit from it as well! Looking forward to hearing your feedback and questions :) submitted by /u/Nextpenade [link] [comments]  ( 1 min )
    [P] Renting lots of GPUs (100-200) in single environment?
    I want to apply an already trained ML model on a huge textual data set. I have funds to rent Cloud GPUs, but have not much experience using them. Preferably, I do the setting up of the environment only once (downloading of the data, model, software packages, etc) only once and then simply send ~100-200 scripts each to their own GPU for processing. Then at the end everything is in the same location and I can easily send back the final result file (~100-200 output files concatenated together) back to my PC. Any advice on how to do that? All GPU renting servers only have 1-8 GPUs per server and do not (seem) to allow for sharing of the environment, which seems very inefficient to me. All comments are appreciated. submitted by /u/Intelligent-End2673 [link] [comments]  ( 2 min )
    [R][P] Algorithmic stability of minibatch SGD
    Hi, was wondering if anyone else has looked into "A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent" and in particular eqn. 2, which is the derivation of Section 3.5 of "Train faster, generalize better: Stability of stochastic gradient descent" adapted for the case where the underlying loss we are interested in guaranteeing generalisation for is upper bounded by M (rather than 1 as assumed by Hardt et al). In the case of minibatch SGD, the number of datapoints n becomes the number of minibatches, as ideally one would like to reduce the number of steps T by maximizing the learning rate, which requires maximizing the minibatch size for the loss to actually converge to 0 on the training data. However, what I am unsure about is, specifically for the classification task where we typically minimize the cross-entropy objective, whether the cross-entropy objective is an upper bound on any kind of M-bounded loss function. In the ideal world, I would like to show that it upper bounds the 0-1 loss which means the cross-entropy over the dataset is an upper bound on the classification accuracy and any generalization statement automatically becomes a statement about the very practical metric of accuracy. Such a statement about cross-entropy upper-bounding 0-1 is made in Section 3C of "Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization". However, one can provide a counterexample in the limit of the softmax "temperature" parameter where the predicted class distribution becomes uniform, in the case of 2 classes, for the typical case of log being the natural logarithm (it is no longer a counter-example if log base 2 is used). I haven't been able to show or find proof that this statement "xent >= 0-1" is true (for some logarithm base and some number of classes) and was hoping that someone might have. submitted by /u/wakeupandshave [link] [comments]  ( 1 min )
    [R] Channel Augmented Joint Learning for Visible-Infrared Recognition
    Since going open source in March 2020, MindSpore gone from strength to strength. The deep learning framework has been downloaded by over 1.2 million users; algorithms running on MindSpore have been published in AI journals or presented at conferences; and countless developments have been released in device-edge-cloud scenarios to transform business fields, such as intelligent manufacturing, cloud, wireless, data communication, energy, and consumer business. Built on extensive experience from the scientific, academic, and industrial sectors, MindSpore-based AI papers accounted for 11% of all AI papers in October 2021, ranking No.2 worldwide by month, and No.3 worldwide in Q4 2021. In this blog post, based on a paper published in ICCV 2021 by Professor Mang Ye of Wuhan University, we intro…  ( 3 min )
    [R] Looking for Ideas in Pre-training a RL Agent
    Hi all, I've been working on reinforcement learning lately, but wanted to come to the general ML subreddit to seek inspiration from other disciplines. I've been working on strategies to decrease the training time for my real-world inverted pendulum experiment. Specifically, I am trying to pre-train the Q network in a simulation before deploying. The strategy that I have found most successful right now is this: start with randomly generated weights REPEAT OF AN EPOCH: - Load new_weights to Q Network - initialize an environment with randomly generated parameters (i.e. random mass, lengths, etc). - Train agent on environment for 100 episodes - Save new_weights I have tried a variety of strategies to add a little bit more control over this process. I've tried a soft update that never showed improvement. W = old_weights * (1 - alpha) + new_weights * alpha I have tried an additive update which was slightly successful. Measured the success of each network as the sum of rewards over the epoch. A = (old_R)/(old_R+new_R) ; B = (new_R)/(old_R+new_R) W = old_weights * A + new_weights * B But none of these work as well as just using the most recent weights. I've included some results if anyone's interested. The first graph is three test trials with random initial weights, the second graph is with pre-trained weights. This is a pretty hand-wavy way of doing this, does anyone have any suggestions to do this better? ​ https://preview.redd.it/9mnewmibbts81.png?width=375&format=png&auto=webp&s=a6ff866a31987375d276d12f69dbe2af40380bf4 https://preview.redd.it/096n7ctcbts81.png?width=375&format=png&auto=webp&s=b19d800fbb416fca288e86640d9458c8993e0759 ​ submitted by /u/nickthorpie [link] [comments]  ( 2 min )
    [P] Recommendations for high frequency multivariate time series data
    Hey there! I'm looking for advice on datasets to use for a project. We are looking for the following traits: ​ 1) Multivariate (at least 3 or 4, and probably no more than 50 or 100 as an upper bound). 2) High frequency (Ideally at least once every 5-10 minutes) 3) We need to have some notion of an underlying 'state' of the data for certain windows. E.g. in an energy setting, period X was the 'family at home using appliances' state. Or in the healthcare setting, period X is 'the patient is in a stable state' while period Y is something like 'the patient experiences a cardiac event' ​ ​ Nice to have: 4) It'd be great if some features had some level of seasonality while others didn't. ​ ​ Do folks have any recommendations for datasets that meet some (or hopefully all) of the criteria? I did some light pursuing on UCI, but it seems like much of it is not high frequency enough, and/or doesn't have some notion of underlying states. submitted by /u/CS_Student95 [link] [comments]  ( 1 min )
    [D] What to do when the authors don't release source code?
    Hello, I am currently working on a research paper that I aim to publish at a reputable conference shortly. In our work, we borrow a feature engineering technique from one of the papers that have not been previously applied to the domain (time series AD) before that paper. However, the authors haven't released the source code of their implementation of the model (but the feature engineering technique is publicly available). I feel like that is an important baseline and just failing to include it would get my paper rejected. I have contacted all the authors for the source code, but none of them responded. The architecture they use is a fairly complicated one and would be very difficult to implement on my own. How do I go about this situation? My advisor told me I can just include a few points on the footnote on why we don't include this as a baseline. Those being: No open-source implementation Contacted the authors, didn't receive a response The paper has not been published, only uploaded to arxiv. Any help is appreciated! submitted by /u/mythrowaway0852 [link] [comments]  ( 5 min )
  • Open

    AI when given the prompt of “Amy Schumer” on wombo.art
    submitted by /u/9YearOldGeneralOfPew [link] [comments]  ( 1 min )
    AI News: New Robot Fingertips Can Feel | AI Tracking Satellite | SingularityDAO DynaSets | Tesla Optimus Specs
    submitted by /u/getrich_or_diemining [link] [comments]
    a quick high-level overview of diffusion models (like dall-e 2)
    submitted by /u/individual_kex [link] [comments]
    How can I train an AI to write articles based on my own work?
    Hi all! As a sort of art experiment, I want to train an AI to write tech news articles based on my own work. I worked as a freelance writer for several years and have thousands of articles (each as a Word doc) on tech news. I want to use those articles to train the AI, then have it generate new articles to post to a blog. I have a pretty good understanding of machine learning, but have never trained a model myself. I'm hoping you all can provide some direction. Some specific questions: Can you recommend a model? For each training article, can I provide a "source" (like another news article) so the AI understands where the content in the training article came from? * For each generated article, can I provide a news article source for it to base its content on? ** Can I use the Word docs as the training set, or do I need to convert them into something else for training? *as an example: If I wrote an article on the release of a new Raspberry Pi board, my source might be the press release on the Raspberry Pi website. **as an example: If I want it to generate an article about a new drone delivery service, my input source might be a news article on Reuters or something. submitted by /u/TheSerialHobbyist [link] [comments]  ( 1 min )
    are there any open source video ads generation model out there?
    Hey is there any models to generate videos for advertisment either as text-to-video images-to-video or video-variation creation, if not would video variation generative models would be a good fit for create ads ?? submitted by /u/National-Departure78 [link] [comments]
    DALL-E 2, the future of AI research, and OpenAI’s business model
    submitted by /u/bendee983 [link] [comments]
    how do i learn artificial intelligence from the basics?
    Is there any resources which has example driven explanations, from scratch or basics? I have seen some websites just jumping into "use this module/library" Without explaining what it does or how it works, just some basic examples so that i can build on top or experiment by my own. submitted by /u/-1Mbps [link] [comments]  ( 1 min )
    Using the NEAT algorithm to teach elves to deliver presents
    submitted by /u/zuparnowa [link] [comments]
    Trippy AI Dream 16 - Gothic Style Jungle Fever - VQGAN CliP Rife-Rea...
    submitted by /u/LordPewPew777 [link] [comments]
    Trippy AI Dream 23 - Flower Power² VQGAN CliP Rife-RealESRGAN
    submitted by /u/LordPewPew777 [link] [comments]
    Trippy AI Dream 32 - WE REACHED 100 SUBSCRIBERS !!
    submitted by /u/LordPewPew777 [link] [comments]
    MindSpore has implemented a visible-infrared recognition algorithm
    submitted by /u/Creative_Habit_6868 [link] [comments]
    I want to learn AI From beginning ? from where can i start?
    submitted by /u/Late_Illustrator_545 [link] [comments]  ( 1 min )
    Baidu Researchers Propose PP-YOLOE Object Detector: an Evolved Version of YOLO Achieving SOTA Performance in Object Detection
    Object detection is a crucial problem in computer vision, and YOLO (You Only Look Once) one-stage object detectors have set the bar for performance since the release of YOLOv1 in 2015. The YOLO series has undergone considerable network and structural improvements over the years. The most recent version, YOLOX, has attained an optimal balance of speed and accuracy on the NVIDIA Tesla V100 Tensor Core GPU. Baidu researchers have improved their earlier PP-YOLOv2 model, resulting in PP-YOLOE, a cutting-edge industrial object detector that beats YOLOv5 and YOLOX in speed and accuracy trade-off. The team’s PP-YOLOE-l variant outperforms PP-YOLOv2 by 1.9 percent AP and YOLOX-l by 1.3 percent AP on COCO datasets. The PP-YOLOv2 baseline model architecture comprises a ResNet50-vd backbone with deformable convolution, a PAN neck with an SPP layer and DropBlock, and a lightweight IoU aware head. PP-YOLOv2 assigns only one anchor box to each ground truth object, similar to YOLOv3. It is strongly reliant on hand-crafted design, which may not generalize well enough when trained on other datasets. Conversely, this technique necessitates a lot of additional hyperparameters. To overcome this problem, Baidu researchers have added an anchor-free technique to PP-YOLOv2 that tiles one anchor point on each pixel and assigns upper and lower bounds for detecting heads to assign ground facts to a matching feature map. The center of a bounding box can then be determined to choose positive samples from the closest pixels. A 4D vector is also predicted for regression, with minor model speedups and precision losses due to the changes. Continue Reading Paper: https://arxiv.org/pdf/2203.16250.pdf Github: https://github.com/PaddlePaddle/PaddleDetection submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Song writing Ai
    Hi all, I’m hoping someone could point me in The direction of an AI that I could dump all my previous song writing into that would spit out something "inspired by’ it. Mostly a bit of fun but interested in seeing what it throws back out at me. thanks in advance for any hot tips. submitted by /u/doccaballero [link] [comments]  ( 1 min )
  • Open

    How to train the NN model with a custom dataset?
    Hi all. I have been trying to work on an object detection project. Basically trying to play around with the codes in the documentation for a custom dataset. I am using Yolov3 and I trained my model using darknet and seems like the model is learning it wrong because of which the weights are not correct either. I don't know how to check that but when doing forward propagation, the array seems to be ok, without nan values, but the confidence is mostly 0's and some 0.25's. Anyone who can guide me, on where I could have gone wrong? ​ code: https://opencv-tutorial.readthedocs.io/en/latest/yolo/yolo.html [yolov3 secion] output: outputs = [[0.03846154 0.03846154 0.27884614 0.21634616 0.5 0.25 ] [0.03846154 0.03846154 0. 0.47596154 0. 0. ] [0.03846154 0.03846154 0.89663464 0.78365386 0.5 0.25 ] ... [0.99038464 0.99038464 0.02403846 0.03125 0.5 0.25 ] [0.99038464 0.99038464 0.03846154 0.07211538 0.5 0. ] [0.99038464 0.99038464 0.07932692 0.05528846 0.5 0. ]] confidence = 0.25 0.0 0.25 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.25 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.25 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 0.0 0.0 0.0 0.0 0.0 0.25 ... submitted by /u/ersa17 [link] [comments]  ( 1 min )
  • Open

    Two-stage Training of Graph Neural Networks for Graph Classification. (arXiv:2011.05097v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN's allocated capacity over the original training method of each model up to 5.4\% points in 12 datasets.  ( 2 min )
    Overlapping Spaces for Compact Graph Representations. (arXiv:2007.02445v3 [cs.LG] UPDATED)
    Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical). As a result, parameter optimization automatically learns the optimal configuration. Additionally, overlapping spaces allow for more compact representations since their geometry is more complex. Our experiments confirm that overlapping spaces outperform the competitors in graph embedding tasks. Here, we consider both distortion setup, where the aim is to preserve distances, and ranking setup, where the relative order should be preserved. The proposed method effectively solves the problem and outperforms the competitors in both settings. We also perform an empirical analysis in a realistic information retrieval task, where we compare all spaces by incorporating them into DSSM. In this case, the proposed overlapping space consistently achieves nearly optimal results without any configuration tuning. This allows for reducing training time, which can be significant in large-scale applications.  ( 2 min )
    CONet: Channel Optimization for Convolutional Neural Networks. (arXiv:2108.06822v2 [cs.CV] UPDATED)
    Neural Architecture Search (NAS) has shifted network design from using human intuition to leveraging search algorithms guided by evaluation metrics. We study channel size optimization in convolutional neural networks (CNN) and identify the role it plays in model accuracy and complexity. Current channel size selection methods are generally limited by discrete sample spaces while suffering from manual iteration and simple heuristics. To solve this, we introduce an efficient dynamic scaling algorithm -- CONet -- that automatically optimizes channel sizes across network layers for a given CNN. Two metrics -- "\textit{Rank}" and "\textit{Rank Average Slope}" -- are introduced to identify the information accumulated in training. The algorithm dynamically scales channel sizes up or down over a fixed searching phase. We conduct experiments on CIFAR10/100 and ImageNet datasets and show that CONet can find efficient and accurate architectures searched in ResNet, DARTS, and DARTS+ spaces that outperform their baseline models. This document supersedes previously published paper in ICCV2021-NeurArch workshop. An additional section is included on manual scaling of channel size in CNNs to numerically validate of the metrics used in searching optimum channel configurations in CNNs.  ( 2 min )
    Covariance-Free Sparse Bayesian Learning. (arXiv:2105.10439v2 [eess.SP] UPDATED)
    Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.  ( 2 min )
    MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories. (arXiv:2106.01808v3 [cs.LG] UPDATED)
    Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.  ( 2 min )
    Learning Polynomial Transformations. (arXiv:2204.04209v1 [cs.LG])
    We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.  ( 2 min )
    Measuring AI Systems Beyond Accuracy. (arXiv:2204.04211v1 [cs.SE])
    Current test and evaluation (T&E) methods for assessing machine learning (ML) system performance often rely on incomplete metrics. Testing is additionally often siloed from the other phases of the ML system lifecycle. Research investigating cross-domain approaches to ML T&E is needed to drive the state of the art forward and to build an Artificial Intelligence (AI) engineering discipline. This paper advocates for a robust, integrated approach to testing by outlining six key questions for guiding a holistic T&E strategy.  ( 2 min )
    Learning-Based Vulnerability Analysis of Cyber-Physical Systems. (arXiv:2103.06271v3 [cs.CR] UPDATED)
    This work focuses on the use of deep learning for vulnerability analysis of cyber-physical systems (CPS). Specifically, we consider a control architecture widely used in CPS (e.g., robotics), where the low-level control is based on e.g., the extended Kalman filter (EKF) and an anomaly detector. To facilitate analyzing the impact potential sensing attacks could have, our objective is to develop learning-enabled attack generators capable of designing stealthy attacks that maximally degrade system operation. We show how such problem can be cast within a learning-based grey-box framework where parts of the runtime information are known to the attacker, and introduce two models based on feed-forward neural networks (FNN); both models are trained offline, using a cost function that combines the attack effects on the estimation error and the residual signal used for anomaly detection, so that the trained models are capable of recursively generating such effective sensor attacks in real-time. The effectiveness of the proposed methods is illustrated on several case studies.  ( 2 min )
    A Low-Cost Robot Science Kit for Education with Symbolic Regression for Hypothesis Discovery and Validation. (arXiv:2204.04187v1 [cond-mat.mtrl-sci])
    The next generation of physical science involves robot scientists - autonomous physical science systems capable of experimental design, execution, and analysis in a closed loop. Such systems have shown real-world success for scientific exploration and discovery, including the first discovery of a best-in-class material. To build and use these systems, the next generation workforce requires expertise in diverse areas including ML, control systems, measurement science, materials synthesis, decision theory, among others. However, education is lagging. Educators need a low-cost, easy-to-use platform to teach the required skills. Industry can also use such a platform for developing and evaluating autonomous physical science methodologies. We present the next generation in science education, a kit for building a low-cost autonomous scientist. The kit was used during two courses at the University of Maryland to teach undergraduate and graduate students autonomous physical science. We discuss its use in the course and its greater capability to teach the dual tasks of autonomous model exploration, optimization, and determination, with an example of autonomous experimental "discovery" of the Henderson-Hasselbalch equation.  ( 2 min )
    Neural graph embeddings via matrix factorization for link prediction: smoothing or truncating negatives?. (arXiv:2011.09907v2 [cs.SI] UPDATED)
    Learning good quality neural graph embeddings has long been achieved by minimzing the pointwise mutual information (PMI) for co-occuring nodes in simulated random walks. This design choice has been mostly popularized by the direct application of the highly-successful word embedding algorithm word2vec to predicting the formation of new links in social, co-citation, and biological networks. However, such a skeumorphic design of graph embedding methods entails a truncation of information coming from pairs of nodes with low PMI. To circumvent this issue, we propose an improved approach to learning low-rank factorization embeddings that incorporate information from such unlikely pairs of nodes and show that it can improve the link prediction performance of baseline methods from 1.2% to 24.2%. Based on our results and observations we outline further steps that could improve the design of next graph embedding algorithms that are based on matrix factorizaion.  ( 2 min )
    Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning. (arXiv:2204.04170v1 [eess.AS])
    Contrastive learning enables learning useful audio and speech representations without ground-truth labels by maximizing the similarity between latent representations of similar signal segments. In this framework various data augmentation techniques are usually exploited to help enforce desired invariances within the learned representations, improving performance on various audio tasks thanks to more robust embeddings. Now, selecting the most relevant augmentations has proven crucial for better downstream performances. Thus, this work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones, for contrastive self-supervised pre-training. This is performed with respect to a downstream task of interest, hence saving a costly hyper-parameter search. Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations. We furthermore conduct a qualitative analysis of the automatically selected augmentations and their variation according to the considered final downstream dataset.  ( 2 min )
    Benchmarks, Algorithms, and Metrics for Hierarchical Disentanglement. (arXiv:2102.05185v4 [cs.LG] UPDATED)
    In representation learning, there has been recent interest in developing algorithms to disentangle the ground-truth generative factors behind a dataset, and metrics to quantify how fully this occurs. However, these algorithms and metrics often assume that both representations and ground-truth factors are flat, continuous, and factorized, whereas many real-world generative processes involve rich hierarchical structure, mixtures of discrete and continuous variables with dependence between them, and even varying intrinsic dimensionality. In this work, we develop benchmarks, algorithms, and metrics for learning such hierarchical representations.  ( 2 min )
    Adaptive dynamic programming for nonaffine nonlinear optimal control problem with state constraints. (arXiv:1911.11397v3 [eess.SY] UPDATED)
    This paper presents a constrained adaptive dynamic programming (CADP) algorithm to solve general nonlinear nonaffine optimal control problems with known dynamics. Unlike previous ADP algorithms, it can directly deal with problems with state constraints. Firstly, a constrained generalized policy iteration (CGPI) framework is developed to handle state constraints by transforming the traditional policy improvement process into a constrained policy optimization problem. Next, we propose an actor-critic variant of CGPI, called CADP, in which both policy and value functions are approximated by multi-layer neural networks to directly map the system states to control inputs and value function, respectively. CADP linearizes the constrained optimization problem locally into a quadratically constrained linear programming problem, and then obtains the optimal update of the policy network by solving its dual problem. A trust region constraint is added to prevent excessive policy update, thus ensuring linearization accuracy. We determine the feasibility of the policy optimization problem by calculating the minimum trust region boundary and update the policy using two recovery rules when infeasible. The vehicle control problem in the path-tracking task is used to demonstrate the effectiveness of this proposed method.  ( 2 min )
    TF-Coder: Program Synthesis for Tensor Manipulations. (arXiv:2003.09040v4 [cs.PL] UPDATED)
    The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.  ( 2 min )
    Karaoker: Alignment-free singing voice synthesis with speech training data. (arXiv:2204.04127v1 [eess.AS])
    Existing singing voice synthesis models (SVS) are usually trained on singing data and depend on either error-prone time-alignment and duration features or explicit music score information. In this paper, we propose Karaoker, a multispeaker Tacotron-based model conditioned on voice characteristic features that is trained exclusively on spoken data without requiring time-alignments. Karaoker synthesizes singing voice following a multi-dimensional template extracted from a source waveform of an unseen speaker/singer. The model is jointly conditioned with a single deep convolutional encoder on continuous data including pitch, intensity, harmonicity, formants, cepstral peak prominence and octaves. We extend the text-to-speech training objective with feature reconstruction, classification and speaker identification tasks that guide the model to an accurate result. Except for multi-tasking, we also employ a Wasserstein GAN training scheme as well as new losses on the acoustic model's output to further refine the quality of the model.  ( 2 min )
    On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging. (arXiv:2107.00464v4 [math.OC] UPDATED)
    We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.  ( 2 min )
    GPSAF: A Generalized Probabilistic Surrogate-Assisted Framework for Constrained Single- and Multi-objective Optimization. (arXiv:2204.04054v1 [math.OC])
    Significant effort has been made to solve computationally expensive optimization problems in the past two decades, and various optimization methods incorporating surrogates into optimization have been proposed. Most research focuses on either exploiting the surrogate by defining a utility optimization problem or customizing an existing optimization method to use one or multiple approximation models. However, only a little attention has been paid to generic concepts applicable to different types of algorithms and optimization problems simultaneously. Thus this paper proposes a generalized probabilistic surrogate-assisted framework (GPSAF), applicable to a broad category of unconstrained and constrained, single- and multi-objective optimization algorithms. The idea is based on a surrogate assisting an existing optimization method. The assistance is based on two distinct phases, one facilitating exploration and another exploiting the surrogates. The exploration and exploitation of surrogates are automatically balanced by performing a probabilistic knockout tournament among different clusters of solutions. A study of multiple well-known population-based optimization algorithms is conducted with and without the proposed surrogate assistance on single- and multi-objective optimization problems with a maximum solution evaluation budget of 300 or less. The results indicate the effectiveness of applying GPSAF to an optimization algorithm and the competitiveness with other surrogate-assisted algorithms.  ( 2 min )
    Checking HateCheck: a cross-functional analysis of behaviour-aware learning for hate speech detection. (arXiv:2204.04042v1 [cs.CL])
    Behavioural testing -- verifying system capabilities by validating human-designed input-output pairs -- is an alternative evaluation method of natural language processing systems proposed to address the shortcomings of the standard approach: computing metrics on held-out data. While behavioural tests capture human prior knowledge and insights, there has been little exploration on how to leverage them for model training and development. With this in mind, we explore behaviour-aware learning by examining several fine-tuning schemes using HateCheck, a suite of functional tests for hate speech detection systems. To address potential pitfalls of training on data originally intended for evaluation, we train and evaluate models on different configurations of HateCheck by holding out categories of test cases, which enables us to estimate performance on potentially overlooked system properties. The fine-tuning procedure led to improvements in the classification accuracy of held-out functionalities and identity groups, suggesting that models can potentially generalise to overlooked functionalities. However, performance on held-out functionality classes and i.i.d. hate speech detection data decreased, which indicates that generalisation occurs mostly across functionalities from the same class and that the procedure led to overfitting to the HateCheck data distribution.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v2 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods, while we also exploit the structure of the underlying matrix constraint. We apply our training scheme to several relevant examples that have been studied in the literature and newly present the application of the method to the training of Wasserstein generative adversarial networks (WGANs). In numerical examples, we show the superiority of our method and its applicability to WGAN training.
    Text-Aware Predictive Monitoring of Business Processes. (arXiv:2104.09962v2 [cs.AI] CROSS LISTED)
    The real-time prediction of business processes using historical event data is an important capability of modern business process monitoring systems. Existing process prediction methods are able to also exploit the data perspective of recorded events, in addition to the control-flow perspective. However, while well-structured numerical or categorical attributes are considered in many prediction techniques, almost no technique is able to utilize text documents written in natural language, which can hold information critical to the prediction task. In this paper, we illustrate the design, implementation, and evaluation of a novel text-aware process prediction model based on Long Short-Term Memory (LSTM) neural networks and natural language models. The proposed model can take categorical, numerical and textual attributes in event data into account to predict the activity and timestamp of the next event, the outcome, and the cycle time of a running process instance. Experiments show that the text-aware model is able to outperform state-of-the-art process prediction methods on simulated and real-world event logs containing textual data.
    Neural Network Optimization for Reinforcement Learning Tasks Using Sparse Computations. (arXiv:2201.02571v2 [cs.LG] UPDATED)
    This article proposes a sparse computation-based method for optimizing neural networks for reinforcement learning (RL) tasks. This method combines two ideas: neural network pruning and taking into account input data correlations; it makes it possible to update neuron states only when changes in them exceed a certain threshold. It significantly reduces the number of multiplications when running neural networks. We tested different RL tasks and achieved 20-150x reduction in the number of multiplications. There were no substantial performance losses; sometimes the performance even improved.
    A Manifold View of Adversarial Risk. (arXiv:2203.13277v2 [cs.LG] UPDATED)
    The adversarial risk of a machine learning model has been widely studied. Most previous works assume that the data lies in the whole ambient space. We propose to take a new angle and take the manifold assumption into consideration. Assuming data lies in a manifold, we investigate two new types of adversarial risk, the normal adversarial risk due to perturbation along normal direction, and the in-manifold adversarial risk due to perturbation within the manifold. We prove that the classic adversarial risk can be bounded from both sides using the normal and in-manifold adversarial risks. We also show with a surprisingly pessimistic case that the standard adversarial risk can be nonzero even when both normal and in-manifold risks are zero. We finalize the paper with empirical studies supporting our theoretical results. Our results suggest the possibility of improving the robustness of a classifier by only focusing on the normal adversarial risk.
    An analysis of over-sampling labeled data in semi-supervised learning with FixMatch. (arXiv:2201.00604v2 [cs.LG] UPDATED)
    Most semi-supervised learning methods over-sample labeled data when constructing training mini-batches. This paper studies whether this common practice improves learning and how. We compare it to an alternative setting where each mini-batch is uniformly sampled from all the training data, labeled or not, which greatly reduces direct supervision from true labels in typical low-label regimes. However, this simpler setting can also be seen as more general and even necessary in multi-task problems where over-sampling labeled data would become intractable. Our experiments on semi-supervised CIFAR-10 image classification using FixMatch show a performance drop when using the uniform sampling approach which diminishes when the amount of labeled data or the training time increases. Further, we analyse the training dynamics to understand how over-sampling of labeled data compares to uniform sampling. Our main finding is that over-sampling is especially beneficial early in training but gets less important in the later stages when more pseudo-labels become correct. Nevertheless, we also find that keeping some true labels remains important to avoid the accumulation of confirmation errors from incorrect pseudo-labels.
    Combining Evolution and Deep Reinforcement Learning for Policy Search: a Survey. (arXiv:2203.14009v3 [cs.LG] UPDATED)
    Deep neuroevolution and deep Reinforcement Learning have received a lot of attention in the last years. Some works have compared them, highlighting theirs pros and cons, but an emerging trend consists in combining them so as to benefit from the best of both worlds. In this paper, we provide a survey of this emerging trend by organizing the literature into related groups of works and casting all the existing combinations in each group into a generic framework. We systematically cover all easily available papers irrespective of their publication status, focusing on the combination mechanisms rather than on the experimental results. In total, we cover 45 algorithms more recent than 2017. We hope this effort will favor the growth of the domain by facilitating the understanding of the relationships between the methods, leading to deeper analyses, outlining missing useful comparisons and suggesting new combinations of mechanisms.
    Generative Adversarial Method Based On Neural Tangent Kernels. (arXiv:2204.04090v1 [cs.LG])
    The recent development of Generative adversarial networks (GANs) has driven many computer vision applications. Despite the great synthesis quality, training GANs often confronts several issues, including non-convergence, mode collapse, and gradient vanishing. There exist several workarounds, for example, regularizing Lipschitz continuity and adopting Wasserstein distance. Although these methods can partially solve the problems, we argue that the problems are result from modeling the discriminator with deep neural networks. In this paper, we base on newly derived deep neural network theories called Neural Tangent Kernel (NTK) and propose a new generative algorithm called generative adversarial NTK (GA-NTK). The GA-NTK models the discriminator as a Gaussian Process (GP). With the help of the NTK theories, the training dynamics of GA-NTK can be described with a closed-form formula. To synthesize data with the closed-form formula, the objectives can be simplified into a single-level adversarial optimization problem. We conduct extensive experiments on real-world datasets, and the results show that GA-NTK can generate images comparable to those by GANs but is much easier to train under various conditions. We also study the current limitations of GA-NTK and propose some workarounds to make GA-NTK more practical.  ( 2 min )
    Ranking with submodular functions on a budget. (arXiv:2204.04168v1 [cs.DS])
    Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.  ( 2 min )
    Sample Complexity versus Depth: An Information Theoretic Analysis. (arXiv:2203.00246v3 [cs.LG] UPDATED)
    Deep learning has proven effective across a range of data sets. In light of this, a natural inquiry is: "for what data generating processes can deep learning succeed?" In this work, we study the sample complexity of learning multilayer data generating processes of a sort for which deep neural networks seem to be suited. We develop general and elegant information-theoretic tools that accommodate analysis of any data generating process -- shallow or deep, parametric or nonparametric, noiseless or noisy. We then use these tools to characterize the dependence of sample complexity on the depth of multilayer processes. Our results indicate roughly linear dependence on depth. This is in contrast to previous results that suggest exponential or high-order polynomial dependence.
    Spinning Language Models: Risks of Propaganda-As-A-Service and Countermeasures. (arXiv:2112.05224v2 [cs.CR] UPDATED)
    We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-time attacks that cause models to "spin" their outputs so as to support an adversary-chosen sentiment or point of view -- but only when the input contains adversary-chosen trigger words. For example, a spinned summarization model outputs positive summaries of any text that mentions the name of some individual or organization. Model spinning introduces a "meta-backdoor" into a model. Whereas conventional backdoors cause models to produce incorrect outputs on inputs with the trigger, outputs of spinned models preserve context and maintain standard accuracy metrics, yet also satisfy a meta-task chosen by the adversary. Model spinning enables propaganda-as-a-service, where propaganda is defined as biased speech. An adversary can create customized language models that produce desired spins for chosen triggers, then deploy these models to generate disinformation (a platform attack), or else inject them into ML training pipelines (a supply-chain attack), transferring malicious functionality to downstream models trained by victims. To demonstrate the feasibility of model spinning, we develop a new backdooring technique. It stacks an adversarial meta-task onto a seq2seq model, backpropagates the desired meta-task output to points in the word-embedding space we call "pseudo-words," and uses pseudo-words to shift the entire output distribution of the seq2seq model. We evaluate this attack on language generation, summarization, and translation models with different triggers and meta-tasks such as sentiment, toxicity, and entailment. Spinned models largely maintain their accuracy metrics (ROUGE and BLEU) while shifting their outputs to satisfy the adversary's meta-task. We also show that, in the case of a supply-chain attack, the spin functionality transfers to downstream models.
    Federated Learning with Adaptive Batchnorm for Personalized Healthcare. (arXiv:2112.00734v2 [cs.LG] UPDATED)
    There is a growing interest in applying machine learning techniques for healthcare. Recently, federated machine learning (FL) is gaining popularity since it allows researchers to train powerful models without compromising data privacy and security. However, the performance of existing FL approaches often deteriorates when encountering non-iid situations where there exist distribution gaps among clients, and few previous efforts focus on personalization in healthcare. In this article, we propose AdaFed to tackle domain shifts and obtain personalized models for local clients. AdaFed learns the similarity between clients via the statistics of the batch normalization layers while preserving the specificity of each client with different local batch normalization. Comprehensive experiments on five healthcare benchmarks demonstrate that AdaFed achieves better accuracy compared to state-of-the-art methods (e.g., \textbf{10}\%+ accuracy improvement for PAMAP2) with faster convergence speed.
    Human Hands as Probes for Interactive Object Understanding. (arXiv:2112.09120v2 [cs.CV] UPDATED)
    Interactive object understanding, or what we can do to objects and how is a long-standing goal of computer vision. In this paper, we tackle this problem through observation of human hands in in-the-wild egocentric videos. We demonstrate that observation of what human hands interact with and how can provide both the relevant data and the necessary supervision. Attending to hands, readily localizes and stabilizes active objects for learning and reveals places where interactions with objects occur. Analyzing the hands shows what we can do to objects and how. We apply these basic principles on the EPIC-KITCHENS dataset, and successfully learn state-sensitive features, and object affordances (regions of interaction and afforded grasps), purely by observing hands in egocentric videos.
    Image prediction of disease progression by style-based manifold extrapolation. (arXiv:2111.11439v2 [eess.IV] UPDATED)
    Disease-modifying management aims to prevent deterioration and progression of the disease, not just relieve symptoms. Unfortunately, the development of necessary therapies is often hampered by the failure to recognize the presymptomatic disease and limited understanding of disease development. We present a generic solution for this problem by a methodology that allows the prediction of progression risk and morphology in individuals using a latent extrapolation optimization approach. To this end, we combined a regularized generative adversarial network (GAN) and a latent nearest neighbor algorithm for joint optimization to generate plausible images of future time points. We evaluated our method on osteoarthritis (OA) data from a multi-center longitudinal study (the Osteoarthritis Initiative, OAI). With presymptomatic baseline data, our model is generative and significantly outperforms the end-to-end learning model in discriminating the progressive cohort. Two experiments were performed with seven experienced radiologists. When no synthetic follow-up radiographs were provided, our model performed better than all seven radiologists. In cases where the synthetic follow-ups generated by our model were available, the specificity and sensitivity of all readers in discriminating progressors increased from $72.3\%$ to $88.6\%$ and from $42.1\%$ to $51.6\%$, respectively. Our results open up a new possibility of using model-based morphology and risk prediction to make predictions about future disease occurrence, as demonstrated in the example of OA.
    Interactive Feature Fusion for End-to-End Noise-Robust Speech Recognition. (arXiv:2110.05267v2 [eess.AS] UPDATED)
    Speech enhancement (SE) aims to suppress the additive noise from a noisy speech signal to improve the speech's perceptual quality and intelligibility. However, the over-suppression phenomenon in the enhanced speech might degrade the performance of downstream automatic speech recognition (ASR) task due to the missing latent information. To alleviate such problem, we propose an interactive feature fusion network (IFF-Net) for noise-robust speech recognition to learn complementary information from the enhanced feature and original noisy feature. Experimental results show that the proposed method achieves absolute word error rate (WER) reduction of 4.1% over the best baseline on RATS Channel-A corpus. Our further analysis indicates that the proposed IFF-Net can complement some missing information in the over-suppressed enhanced feature.
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v3 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.
    DAD: Data-free Adversarial Defense at Test Time. (arXiv:2204.01568v2 [cs.LG] UPDATED)
    Deep models are highly susceptible to adversarial attacks. Such attacks are carefully crafted imperceptible noises that can fool the network and can cause severe consequences when deployed. To encounter them, the model requires training data for adversarial training or explicit regularization-based techniques. However, privacy has become an important concern, restricting access to only trained models but not the training data (e.g. biometric data). Also, data curation is expensive and companies may have proprietary rights over it. To handle such situations, we propose a completely novel problem of 'test-time adversarial defense in absence of training data and even their statistics'. We solve it in two stages: a) detection and b) correction of adversarial samples. Our adversarial sample detection framework is initially trained on arbitrary data and is subsequently adapted to the unlabelled test data through unsupervised domain adaptation. We further correct the predictions on detected adversarial samples by transforming them in Fourier domain and obtaining their low frequency component at our proposed suitable radius for model prediction. We demonstrate the efficacy of our proposed technique via extensive experiments against several adversarial attacks and for different model architectures and datasets. For a non-robust Resnet-18 model pre-trained on CIFAR-10, our detection method correctly identifies 91.42% adversaries. Also, we significantly improve the adversarial accuracy from 0% to 37.37% with a minimal drop of 0.02% in clean accuracy on state-of-the-art 'Auto Attack' without having to retrain the model.
    Measuring disentangled generative spatio-temporal representation. (arXiv:2202.04821v2 [cs.LG] UPDATED)
    Disentangled representation learning offers useful properties such as dimension reduction and interpretability, which are essential to modern deep learning approaches. Although deep learning techniques have been widely applied to spatio-temporal data mining, there has been little attention to further disentangle the latent features and understanding their contribution to the model performance, particularly their mutual information and correlation across features. In this study, we adopt two state-of-the-art disentangled representation learning methods and apply them to three large-scale public spatio-temporal datasets. To evaluate their performance, we propose an internal evaluation metric focusing on the degree of correlations among latent variables of the learned representations and the prediction performance of the downstream tasks. Empirical results show that our modified method can learn disentangled representations that achieve the same level of performance as existing state-of-the-art ST deep learning methods in a spatio-temporal sequence forecasting problem. Additionally, we find that our methods can be used to discover real-world spatial-temporal semantics to describe the variables in the learned representation.
    Identifiability of Label Noise Transition Matrix. (arXiv:2202.02016v2 [cs.LG] UPDATED)
    The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.
    A Spatial-Temporal Attention Multi-Graph Convolution Network for Ride-Hailing Demand Prediction Based on Periodicity with Offset. (arXiv:2203.12505v2 [cs.LG] UPDATED)
    Ride-hailing service is becoming a leading part in urban transportation. To improve the efficiency of ride-hailing service, accurate prediction of transportation demand is a fundamental challenge. In this paper, we tackle this problem from both aspects of network structure and data-set formulation. For network design, we propose a spatial-temporal attention multi-graph convolution network (STA-MGCN). A spatial-temporal layer in STA-MGCN is developed to capture the temporal correlations by temporal attention mechanism and temporal gate convolution, and the spatial correlations by multigraph convolution. A feature cluster layer is introduced to learn latent regional functions and to reduce the computation burden. For the data-set formulation, we develop a novel approach which considers the transportation feature of periodicity with offset. Instead of only using history data during the same time period, the history order demand in forward and backward neighboring time periods from yesterday and last week are also included. Extensive experiments on the three real-world datasets of New-York, Chicago and Chengdu show that the proposed algorithm achieves the state-of-the-art performance for ride-hailing demand prediction.
    AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning. (arXiv:2110.13005v4 [cs.LG] UPDATED)
    In the last few years, the memory requirements to train state-of-the-art neural networks have far exceeded the DRAM capacities of modern hardware accelerators. This has necessitated the development of efficient algorithms to train these neural networks in parallel on large-scale GPU-based clusters. Since computation is relatively inexpensive on modern GPUs, designing and implementing extremely efficient communication in these parallel training algorithms is critical for extracting the maximum performance. This paper presents AxoNN, a parallel deep learning framework that exploits asynchrony and message-driven execution to schedule neural network operations on each GPU, thereby reducing GPU idle time and maximizing hardware efficiency. By using the CPU memory as a scratch space for offloading data periodically during training, AxoNN is able to reduce GPU memory consumption by four times. This allows us to increase the number of parameters per GPU by four times, thus reducing the amount of communication and increasing performance by over 13%. When tested against large transformer models with 12-100 billion parameters on 48-384 NVIDIA Tesla V100 GPUs, AxoNN achieves a per-GPU throughput of 49.4-54.78% of theoretical peak and reduces the training time by 22-37 days (15-25% speedup) as compared to the state-of-the-art.
    Federated Causal Inference in Heterogeneous Observational Data. (arXiv:2107.11732v3 [cs.LG] UPDATED)
    Analyzing observational data from multiple sources can be useful for increasing statistical power to detect a treatment effect; however, practical constraints such as privacy considerations may restrict individual-level information sharing across data sets. This paper develops federated methods that only utilize summary-level information from heterogeneous data sets. Our federated methods provide doubly-robust point estimates of treatment effects as well as variance estimates. We derive the asymptotic distributions of our federated estimators, which are shown to be asymptotically equivalent to the corresponding estimators from the combined, individual-level data. We show that to achieve these properties, federated methods should be adjusted based on conditions such as whether models are correctly specified and stable across heterogeneous data sets.
    Low-Resource Adaptation of Open-Domain Generative Chatbots. (arXiv:2108.06329v2 [cs.CL] UPDATED)
    Recent work building open-domain chatbots has demonstrated that increasing model size improves performance. On the other hand, latency and connectivity considerations dictate the move of digital assistants on the device. Giving a digital assistant like Siri, Alexa, or Google Assistant the ability to discuss just about anything leads to the need for reducing the chatbot model size such that it fits on the user's device. We demonstrate that low parameter models can simultaneously retain their general knowledge conversational abilities while improving in a specific domain. Additionally, we propose a generic framework that accounts for variety in question types, tracks reference throughout multi-turn conversations, and removes inconsistent and potentially toxic responses. Our framework seamlessly transitions between chatting and performing transactional tasks, which will ultimately make interactions with digital assistants more human-like. We evaluate our framework on 1 internal and 4 public benchmark datasets using both automatic (Perplexity) and human (SSA - Sensibleness and Specificity Average) evaluation metrics and establish comparable performance while reducing model parameters by 90%.
    GCA-Net : Utilizing Gated Context Attention for Improving Image Forgery Localization and Detection. (arXiv:2112.04298v3 [cs.CV] UPDATED)
    Forensic analysis of manipulated pixels requires the identification of various hidden and subtle features from images. Conventional image recognition models generally fail at this task because they are biased and more attentive toward the dominant local and spatial features. In this paper, we propose a novel Gated Context Attention Network (GCA-Net) that utilizes non-local attention in conjunction with a gating mechanism in order to capture the finer image discrepancies and better identify forged regions. The proposed framework uses high dimensional embeddings to filter and aggregate the relevant context from coarse feature maps at various stages of the decoding process. This improves the network's understanding of global differences and reduces false-positive localizations. Our evaluation on standard image forensic benchmarks shows that GCA-Net can both compete against and improve over state-of-the-art networks by an average of 4.7% AUC. Additional ablation studies also demonstrate the method's robustness against attributions and resilience to false-positive predictions.
    Generalizing to Unseen Domains: A Survey on Domain Generalization. (arXiv:2103.03097v6 [cs.LG] UPDATED)
    Machine learning systems generally assume that the training and testing distributions are the same. To this end, a key requirement is to develop models that can generalize to unseen distributions. Domain generalization (DG), i.e., out-of-distribution generalization, has attracted increasing interests in recent years. Domain generalization deals with a challenging setting where one or several different but related domain(s) are given, and the goal is to learn a model that can generalize to an unseen test domain. Great progress has been made in the area of domain generalization for years. This paper presents the first review of recent advances in this area. First, we provide a formal definition of domain generalization and discuss several related fields. We then thoroughly review the theories related to domain generalization and carefully analyze the theory behind generalization. We categorize recent algorithms into three classes: data manipulation, representation learning, and learning strategy, and present several popular algorithms in detail for each category. Third, we introduce the commonly used datasets, applications, and our open-sourced codebase for fair evaluation. Finally, we summarize existing literature and present some potential research topics for the future.
    Group-based Distinctive Image Captioning with Memory Attention. (arXiv:2108.09151v4 [cs.CV] UPDATED)
    Describing images using natural language is widely known as image captioning, which has made consistent progress due to the development of computer vision and natural language generation techniques. Though conventional captioning models achieve high accuracy based on popular metrics, i.e., BLEU, CIDEr, and SPICE, the ability of captions to distinguish the target image from other similar images is under-explored. To generate distinctive captions, a few pioneers employ contrastive learning or re-weighted the ground-truth captions, which focuses on one single input image. However, the relationships between objects in a similar image group (e.g., items or properties within the same album or fine-grained events) are neglected. In this paper, we improve the distinctiveness of image captions using a Group-based Distinctive Captioning Model (GdisCap), which compares each image with other images in one similar group and highlights the uniqueness of each image. In particular, we propose a group-based memory attention (GMA) module, which stores object features that are unique among the image group (i.e., with low similarity to objects in other images). These unique object features are highlighted when generating captions, resulting in more distinctive captions. Furthermore, the distinctive words in the ground-truth captions are selected to supervise the language decoder and GMA. Finally, we propose a new evaluation metric, distinctive word rate (DisWordRate) to measure the distinctiveness of captions. Quantitative results indicate that the proposed method significantly improves the distinctiveness of several baseline models, and achieves the state-of-the-art performance on both accuracy and distinctiveness. Results of a user study agree with the quantitative evaluation and demonstrate the rationality of the new metric DisWordRate.
    Pretext Tasks selection for multitask self-supervised speech representation learning. (arXiv:2107.00594v4 [eess.AS] UPDATED)
    Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.
    Omni-Training for Data-Efficient Deep Learning. (arXiv:2110.07510v2 [cs.LG] UPDATED)
    Learning a generalizable deep model from a few examples in a short time remains a major challenge of machine learning, which has impeded its wide deployment to many scenarios. Recent advances reveal that a properly pre-trained model endows an important property: transferability. A higher transferability of the learned representations indicates a better generalizability across domains of different distributions (domain transferability), or across tasks of different semantics (task transferability). Transferability has become the key to enable data-efficient deep learning, however, existing pre-training methods focus only on domain transferability while meta-training methods only on task transferability. This restricts their data-efficiency in downstream scenarios of diverging domains and tasks. A finding of this paper is that even a tight combination of pre-training and meta-training cannot achieve both kinds of transferability. This motivates the proposed Omni-Training framework towards data-efficient deep learning. Our first contribution is Omni-Net, a tri-flow architecture. Besides the joint representation flow, Omni-Net introduces two new parallel flows for pre-training and meta-training, respectively responsible for learning representations of domain transferability and task transferability. Omni-Net coordinates the parallel flows by routing them via the joint-flow, making each gain the other kind of transferability. Our second contribution is Omni-Loss, in which a self-distillation regularization is imposed to enable knowledge transfer across the training process. Omni-Training is a general framework that accommodates many existing pre-training and meta-training algorithms. A thorough evaluation on cross-task and cross-domain datasets in classification, regression and reinforcement learning problems shows that Omni-Training consistently outperforms the state-of-the-art methods.
    Constraints Penalized Q-learning for Safe Offline Reinforcement Learning. (arXiv:2107.09003v3 [cs.LG] UPDATED)
    We study the problem of safe offline reinforcement learning (RL), the goal is to learn a policy that maximizes long-term reward while satisfying safety constraints given only offline data, without further interaction with the environment. This problem is more appealing for real world RL applications, in which data collection is costly or dangerous. Enforcing constraint satisfaction is non-trivial, especially in offline settings, as there is a potential large discrepancy between the policy distribution and the data distribution, causing errors in estimating the value of safety constraints. We show that na\"ive approaches that combine techniques from safe RL and offline RL can only learn sub-optimal solutions. We thus develop a simple yet effective algorithm, Constraints Penalized Q-Learning (CPQ), to solve the problem. Our method admits the use of data generated by mixed behavior policies. We present a theoretical analysis and demonstrate empirically that our approach can learn robustly across a variety of benchmark control tasks, outperforming several baselines.
    How to distribute data across tasks for meta-learning?. (arXiv:2103.08463v3 [cs.LG] UPDATED)
    Meta-learning models transfer the knowledge acquired from previous tasks to quickly learn new ones. They are trained on benchmarks with a fixed number of data points per task. This number is usually arbitrary and it is unknown how it affects performance at testing. Since labelling of data is expensive, finding the optimal allocation of labels across training tasks may reduce costs. Given a fixed budget of labels, should we use a small number of highly labelled tasks, or many tasks with few labels each? Should we allocate more labels to some tasks and less to others? We show that: 1) If tasks are homogeneous, there is a uniform optimal allocation, whereby all tasks get the same amount of data; 2) At fixed budget, there is a trade-off between number of tasks and number of data points per task, with a unique solution for the optimum; 3) When trained separately, harder task should get more data, at the cost of a smaller number of tasks; 4) When training on a mixture of easy and hard tasks, more data should be allocated to easy tasks. Interestingly, Neuroscience experiments have shown that human visual skills also transfer better from easy tasks. We prove these results mathematically on mixed linear regression, and we show empirically that the same results hold for few-shot image classification on CIFAR-FS and mini-ImageNet. Our results provide guidance for allocating labels across tasks when collecting data for meta-learning.
    Quantum Machine Learning Framework for Virtual Screening in Drug Discovery: a Prospective Quantum Advantage. (arXiv:2204.04017v1 [quant-ph])
    Machine Learning (ML) for Ligand Based Virtual Screening (LB-VS) is an important in-silico tool for discovering new drugs in a faster and cost-effective manner, especially for emerging diseases such as COVID-19. In this paper, we propose a general-purpose framework combining a classical Support Vector Classifier (SVC) algorithm with quantum kernel estimation for LB-VS on real-world databases, and we argue in favor of its prospective quantum advantage. Indeed, we heuristically prove that our quantum integrated workflow can, at least in some relevant instances, provide a tangible advantage compared to state-of-art classical algorithms operating on the same datasets, showing strong dependence on target and features selection method. Finally, we test our algorithm on IBM Quantum processors using ADRB2 and COVID-19 datasets, showing that hardware simulations provide results in line with the predicted performances and can surpass classical equivalents.
    KCD: Knowledge Walks and Textual Cues Enhanced Political Perspective Detection in News Media. (arXiv:2204.04046v1 [cs.LG])
    Political perspective detection has become an increasingly important task that can help combat echo chambers and political polarization. Previous approaches generally focus on leveraging textual content to identify stances, while they fail to reason with background knowledge or leverage the rich semantic and syntactic textual labels in news articles. In light of these limitations, we propose KCD, a political perspective detection approach to enable multi-hop knowledge reasoning and incorporate textual cues as paragraph-level labels. Specifically, we firstly generate random walks on external knowledge graphs and infuse them with news text representations. We then construct a heterogeneous information network to jointly model news content as well as semantic, syntactic and entity cues in news articles. Finally, we adopt relational graph neural networks for graph-level representation learning and conduct political perspective detection. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods on two benchmark datasets. We further examine the effect of knowledge walks and textual cues and how they contribute to our approach's data efficiency.
    Predicting Berth Stay for Tanker Terminals: A Systematic and Dynamic Approach. (arXiv:2204.04085v1 [cs.CE])
    Given the trend of digitization and increasing number of maritime transport, prediction of vessel berth stay has been triggered for requirements of operation research and scheduling optimization problem in the era of maritime big data, which takes a significant part in port efficiency and maritime logistics enhancement. This study proposes a systematic and dynamic approach of predicting berth stay for tanker terminals. The approach covers three innovative aspects: 1) Data source employed is multi-faceted, including cargo operation data from tanker terminals, time-series data from automatic identification system (AIS), etc. 2) The process of berth stay is decomposed into multiple blocks according to data analysis and information extraction innovatively, and practical operation scenarios are also developed accordingly. 3) The predictive models of berth stay are developed on the basis of prior data analysis and information extraction under two methods, including regression and decomposed distribution. The models are evaluated under four dynamic scenarios with certain designated cargoes among two different terminals. The evaluation results show that the proposed approach can predict berth stay with the accuracy up to 98.81% validated by historical baselines, and also demonstrate the proposed approach has dynamic capability of predicting berth stay among the scenarios. The model may be potentially applied for short-term pilot-booking or scheduling optimizations within a reasonable time frame for advancement of port intelligence and logistics efficiency.
    Global Update Guided Federated Learning. (arXiv:2204.03920v1 [cs.LG])
    Federated learning protects data privacy and security by exchanging models instead of data. However, unbalanced data distributions among participating clients compromise the accuracy and convergence speed of federated learning algorithms. To alleviate this problem, unlike previous studies that limit the distance of updates for local models, we propose global-update-guided federated learning (FedGG), which introduces a model-cosine loss into local objective functions, so that local models can fit local data distributions under the guidance of update directions of global models. Furthermore, considering that the update direction of a global model is informative in the early stage of training, we propose adaptive loss weights based on the update distances of local models. Numerical simulations show that, compared with other advanced algorithms, FedGG has a significant improvement on model convergence accuracies and speeds. Additionally, compared with traditional fixed loss weights, adaptive loss weights enable our algorithm to be more stable and easier to implement in practice.
    Ontology Matching Through Absolute Orientation of Embedding Spaces. (arXiv:2204.04040v1 [cs.AI])
    Ontology matching is a core task when creating interoperable and linked open datasets. In this paper, we explore a novel structure-based mapping approach which is based on knowledge graph embeddings: The ontologies to be matched are embedded, and an approach known as absolute orientation is used to align the two embedding spaces. Next to the approach, the paper presents a first, preliminary evaluation using synthetic and real-world datasets. We find in experiments with synthetic data, that the approach works very well on similarly structured graphs; it handles alignment noise better than size and structural differences in the ontologies.
    Self-supervised Speaker Diarization. (arXiv:2204.04166v1 [cs.SD])
    Over the last few years, deep learning has grown in popularity for speaker verification, identification, and diarization. Inarguably, a significant part of this success is due to the demonstrated effectiveness of their speaker representations. These, however, are heavily dependent on large amounts of annotated data and can be sensitive to new domains. This study proposes an entirely unsupervised deep-learning model for speaker diarization. Specifically, the study focuses on generating high-quality neural speaker representations without any annotated data, as well as on estimating secondary hyperparameters of the model without annotations. The speaker embeddings are represented by an encoder trained in a self-supervised fashion using pairs of adjacent segments assumed to be of the same speaker. The trained encoder model is then used to self-generate pseudo-labels to subsequently train a similarity score between different segments of the same call using probabilistic linear discriminant analysis (PLDA) and further to learn a clustering stopping threshold. We compared our model to state-of-the-art unsupervised as well as supervised baselines on the CallHome benchmarks. According to empirical results, our approach outperforms unsupervised methods when only two speakers are present in the call, and is only slightly worse than recent supervised models.
    C-NMT: A Collaborative Inference Framework for Neural Machine Translation. (arXiv:2204.04043v1 [cs.LG])
    Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence- to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach.
    Transfer Attacks Revisited: A Large-Scale Empirical Study in Real Computer Vision Settings. (arXiv:2204.04063v1 [cs.CV])
    One intriguing property of adversarial attacks is their "transferability" -- an adversarial example crafted with respect to one deep neural network (DNN) model is often found effective against other DNNs as well. Intensive research has been conducted on this phenomenon under simplistic controlled conditions. Yet, thus far, there is still a lack of comprehensive understanding about transferability-based attacks ("transfer attacks") in real-world environments. To bridge this critical gap, we conduct the first large-scale systematic empirical study of transfer attacks against major cloud-based MLaaS platforms, taking the components of a real transfer attack into account. The study leads to a number of interesting findings which are inconsistent to the existing ones, including: (1) Simple surrogates do not necessarily improve real transfer attacks. (2) No dominant surrogate architecture is found in real transfer attacks. (3) It is the gap between posterior (output of the softmax layer) rather than the gap between logit (so-called $\kappa$ value) that increases transferability. Moreover, by comparing with prior works, we demonstrate that transfer attacks possess many previously unknown properties in real-world environments, such as (1) Model similarity is not a well-defined concept. (2) $L_2$ norm of perturbation can generate high transferability without usage of gradient and is a more powerful source than $L_\infty$ norm. We believe this work sheds light on the vulnerabilities of popular MLaaS platforms and points to a few promising research directions.
    EPASAD: Ellipsoid decision boundary based Process-Aware Stealthy Attack Detector. (arXiv:2204.04154v1 [cs.CR])
    Due to the importance of Critical Infrastructure (CI) in a nation's economy, they have been lucrative targets for cyber attackers. These critical infrastructures are usually Cyber-Physical Systems (CPS) such as power grids, water, and sewage treatment facilities, oil and gas pipelines, etc. In recent times, these systems have suffered from cyber attacks numerous times. Researchers have been developing cyber security solutions for CIs to avoid lasting damages. According to standard frameworks, cyber security based on identification, protection, detection, response, and recovery are at the core of these research. Detection of an ongoing attack that escapes standard protection such as firewall, anti-virus, and host/network intrusion detection has gained importance as such attacks eventually affect the physical dynamics of the system. Therefore, anomaly detection in physical dynamics proves an effective means to implement defense-in-depth. PASAD is one example of anomaly detection in the sensor/actuator data, representing such systems' physical dynamics. We present EPASAD, which improves the detection technique used in PASAD to detect these micro-stealthy attacks, as our experiments show that PASAD's spherical boundary-based detection fails to detect. Our method EPASAD overcomes this by using Ellipsoid boundaries, thereby tightening the boundaries in various dimensions, whereas a spherical boundary treats all dimensions equally. We validate EPASAD using the dataset produced by the TE-process simulator and the C-town datasets. The results show that EPASAD improves PASAD's average recall by 5.8% and 9.5% for the two datasets, respectively.
    Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 2. (arXiv:2204.03955v1 [cs.CE])
    In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing weighted average turnaround time. The proposed approach is developed as a heuristic algorithm applied and investigated through different observation windows with weekly rolling horizon paradigm method. The experimental results show that the proposed approach is effective and promising on mitigating the turnaround time of vessels. The results demonstrate that largest potential savings of turnaround time (weighted average) are around 17 hours (28%) reduction on baseline of 1-week observation, 45 hours (37%) reduction on baseline of 2-week observation and 70 hours (40%) reduction on baseline of 3-week observation. Even though the experimental results are based on historical datasets, the results potentially present significant benefits if real-time applications were applied under a quadratic computational complexity.
    Labeling-Free Comparison Testing of Deep Learning Models. (arXiv:2204.03994v1 [cs.LG])
    Various deep neural networks (DNNs) are developed and reported for their tremendous success in multiple domains. Given a specific task, developers can collect massive DNNs from public sources for efficient reusing and avoid redundant work from scratch. However, testing the performance (e.g., accuracy and robustness) of multiple DNNs and giving a reasonable recommendation that which model should be used is challenging regarding the scarcity of labeled data and demand of domain expertise. Existing testing approaches are mainly selection-based where after sampling, a few of the test data are labeled to discriminate DNNs. Therefore, due to the randomness of sampling, the performance ranking is not deterministic. In this paper, we propose a labeling-free comparison testing approach to overcome the limitations of labeling effort and sampling randomness. The main idea is to learn a Bayesian model to infer the models' specialty only based on predicted labels. To evaluate the effectiveness of our approach, we undertook exhaustive experiments on 9 benchmark datasets spanning in the domains of image, text, and source code, and 165 DNNs. In addition to accuracy, we consider the robustness against synthetic and natural distribution shifts. The experimental results demonstrate that the performance of existing approaches degrades under distribution shifts. Our approach outperforms the baseline methods by up to 0.74 and 0.53 on Spearman's correlation and Kendall's $\tau$, respectively, regardless of the dataset and distribution shift. Additionally, we investigated the impact of model quality (accuracy and robustness) and diversity (standard deviation of the quality) on the testing effectiveness and observe that there is a higher chance of a good result when the quality is over 50\% and the diversity is larger than 18\%.  ( 2 min )
    ECG Biometric Recognition: Review, System Proposal, and Benchmark Evaluation. (arXiv:2204.03992v1 [cs.LG])
    Electrocardiograms (ECGs) have shown unique patterns to distinguish between different subjects and present important advantages compared to other biometric traits, such as difficulty to counterfeit, liveness detection, and ubiquity. Also, with the success of Deep Learning technologies, ECG biometric recognition has received increasing interest in recent years. However, it is not easy to evaluate the improvements of novel ECG proposed methods, mainly due to the lack of public data and standard experimental protocols. In this study, we perform extensive analysis and comparison of different scenarios in ECG biometric recognition. Both verification and identification tasks are investigated, as well as single- and multi-session scenarios. Finally, we also perform single- and multi-lead ECG experiments, considering traditional scenarios using electrodes in the chest and limbs and current user-friendly wearable devices. In addition, we present ECGXtractor, a robust Deep Learning technology trained with an in-house large-scale database and able to operate successfully across various scenarios and multiple databases. We introduce our proposed feature extractor, trained with multiple sinus-rhythm heartbeats belonging to 55,967 subjects, and provide a general public benchmark evaluation with detailed experimental protocol. We evaluate the system performance over four different databases: i) our in-house database, ii) PTB, iii) ECG-ID, and iv) CYBHi. With the widely used PTB database, we achieve Equal Error Rates of 0.14% and 2.06% in verification, and accuracies of 100% and 96.46% in identification, respectively in single- and multi-session analysis. We release the source code, experimental protocol details, and pre-trained models in GitHub to advance in the field.  ( 2 min )
    Does Robustness on ImageNet Transfer to Downstream Tasks?. (arXiv:2204.03934v1 [cs.CV])
    As clean ImageNet accuracy nears its ceiling, the research community is increasingly more concerned about robust accuracy under distributional shifts. While a variety of methods have been proposed to robustify neural networks, these techniques often target models trained on ImageNet classification. At the same time, it is a common practice to use ImageNet pretrained backbones for downstream tasks such as object detection, semantic segmentation, and image classification from different domains. This raises a question: Can these robust image classifiers transfer robustness to downstream tasks? For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet. For CIFAR10 classification, we find that models that are robustified for ImageNet do not retain robustness when fully fine-tuned. These findings suggest that current robustification techniques tend to emphasize ImageNet evaluations. Moreover, network architecture is a strong source of robustness when we consider transfer learning.  ( 2 min )
    SnapMode: An Intelligent and Distributed Large-Scale Fashion Image Retrieval Platform Based On Big Data and Deep Generative Adversarial Network Technologies. (arXiv:2204.03998v1 [cs.IR])
    Fashion is now among the largest industries worldwide, for it represents human history and helps tell the worlds story. As a result of the Fourth Industrial Revolution, the Internet has become an increasingly important source of fashion information. However, with a growing number of web pages and social data, it is nearly impossible for humans to manually catch up with the ongoing evolution and the continuously variable content in this domain. The proper management and exploitation of big data can pave the way for the substantial growth of the global economy as well as citizen satisfaction. Therefore, computer scientists have found it challenging to handle e-commerce fashion websites by using big data and machine learning technologies. This paper first proposes a scalable focused Web Crawler engine based on the distributed computing platforms to extract and process fashion data on e-commerce websites. The role of the proposed platform is then described in developing a disentangled feature extraction method by employing deep convolutional generative adversarial networks (DCGANs) for content-based image indexing and retrieval. Finally, the state-of-the-art solutions are compared, and the results of the proposed approach are analyzed on a standard dataset. For the real-life implementation of the proposed solution, a Web-based application is developed on Apache Storm, Kafka, Solr, and Milvus platforms to create a fashion search engine called SnapMode.  ( 2 min )
    Channel model for end-to-end learning of communications systems: A survey. (arXiv:2204.03944v1 [cs.LG])
    The traditional communication model based on chain of multiple independent processing blocks is constraint to efficiency and introduces artificial barriers. Thus, each individually optimized block does not guarantee end-to-end performance of the system. Recently, end-to-end learning of communications systems through machine learning (ML) have been proposed to optimize the system metrics jointly over all components. These methods show performance improvements but has a limitation that it requires a differentiable channel model. In this study, we have summarized the existing approaches that alleviates this problem. We believe that this study will provide better understanding of the topic and an insight into future research in this field.  ( 2 min )
    Mel-spectrogram features for acoustic vehicle detection and speed estimation. (arXiv:2204.04013v1 [cs.LG])
    The paper addresses acoustic vehicle detection and speed estimation from single sensor measurements. We predict the vehicle's pass-by instant by minimizing clipped vehicle-to-microphone distance, which is predicted from the mel-spectrogram of input audio, in a supervised learning approach. In addition, mel-spectrogram-based features are used directly for vehicle speed estimation, without introducing any intermediate features. The results show that the proposed features can be used for accurate vehicle detection and speed estimation, with an average error of 7.87 km/h. If we formulate speed estimation as a classification problem, with a 10 km/h discretization interval, the proposed method attains the average accuracy of 48.7% for correct class prediction and 91.0% when an offset of one class is allowed. The proposed method is evaluated on a dataset of 304 urban-environment on-field recordings of ten different vehicles.  ( 2 min )
    The Complexity of Markov Equilibrium in Stochastic Games. (arXiv:2204.03991v1 [cs.LG])
    We show that computing approximate stationary Markov coarse correlated equilibria (CCE) in general-sum stochastic games is computationally intractable, even when there are two players, the game is turn-based, the discount factor is an absolute constant, and the approximation is an absolute constant. Our intractability results stand in sharp contrast to normal-form games where exact CCEs are efficiently computable. A fortiori, our results imply that there are no efficient algorithms for learning stationary Markov CCE policies in multi-agent reinforcement learning (MARL), even when the interaction is two-player and turn-based, and both the discount factor and the desired approximation of the learned policies is an absolute constant. In turn, these results stand in sharp contrast to single-agent reinforcement learning (RL) where near-optimal stationary Markov policies can be efficiently learned. Complementing our intractability results for stationary Markov CCEs, we provide a decentralized algorithm (assuming shared randomness among players) for learning a nonstationary Markov CCE policy with polynomial time and sample complexity in all problem parameters. Previous work for learning Markov CCE policies all required exponential time and sample complexity in the number of players.  ( 2 min )
    DiversiTree: Computing Diverse Sets of Near-Optimal Solutions to Mixed-Integer Optimization Problems. (arXiv:2204.03822v1 [cs.DM])
    While most methods for solving mixed-integer optimization problems seek a single optimal solution, finding a diverse set of near-optimal solutions can often be more useful. State of the art methods for generating diverse near-optimal solutions usually take a two-phase approach, first finding a set of near-optimal solutions and then finding a diverse subset. In contrast, we present a method of finding a set of diverse solutions by emphasizing diversity within the search for near-optimal solutions. Specifically, within a branch-and-bound framework, we investigate parameterized node selection rules that explicitly consider diversity. Our results indicate that our approach significantly increases diversity of the final solution set. When compared with existing methods for finding diverse near-optimal sets, our method runs with similar run-time as regular node selection methods and gives a diversity improvement of up to 140%. In contrast, popular node selection rules such as best-first search gives an improvement of no more than 40%. Further, we find that our method is most effective when diversity is emphasized more in node selection when deeper in the tree and when the solution set has grown large enough.  ( 2 min )
    Optimizing Coordinative Schedules for Tanker Terminals: An Intelligent Large Spatial-Temporal Data-Driven Approach -- Part 1. (arXiv:2204.03899v1 [cs.CE])
    In this study, a novel coordinative scheduling optimization approach is proposed to enhance port efficiency by reducing average wait time and turnaround time. The proposed approach consists of enhanced particle swarm optimization (ePSO) as kernel and augmented firefly algorithm (AFA) as global optimal search. Two paradigm methods of the proposed approach are investigated, which are batch method and rolling horizon method. The experimental results show that both paradigm methods of proposed approach can effectively enhance port efficiency. The average wait time could be significantly reduced by 86.0% - 95.5%, and the average turnaround time could eventually save 38.2% - 42.4% with respect to historical benchmarks. Moreover, the paradigm method of rolling horizon could reduce to 20 mins on running time over 3-month datasets, rather than 4 hrs on batch method at corresponding maximum performance.
    Network Shuffling: Privacy Amplification via Random Walks. (arXiv:2204.03919v1 [cs.CR])
    Recently, it is shown that shuffling can amplify the central differential privacy guarantees of data randomized with local differential privacy. Within this setup, a centralized, trusted shuffler is responsible for shuffling by keeping the identities of data anonymous, which subsequently leads to stronger privacy guarantees for systems. However, introducing a centralized entity to the originally local privacy model loses some appeals of not having any centralized entity as in local differential privacy. Moreover, implementing a shuffler in a reliable way is not trivial due to known security issues and/or requirements of advanced hardware or secure computation technology. Motivated by these practical considerations, we rethink the shuffle model to relax the assumption of requiring a centralized, trusted shuffler. We introduce network shuffling, a decentralized mechanism where users exchange data in a random-walk fashion on a network/graph, as an alternative of achieving privacy amplification via anonymity. We analyze the threat model under such a setting, and propose distributed protocols of network shuffling that is straightforward to implement in practice. Furthermore, we show that the privacy amplification rate is similar to other privacy amplification techniques such as uniform shuffling. To our best knowledge, among the recently studied intermediate trust models that leverage privacy amplification techniques, our work is the first that is not relying on any centralized entity to achieve privacy amplification.  ( 2 min )
    Study of a committee of neural networks for biometric hand-geometry recognition. (arXiv:2204.03935v1 [cs.CV])
    This Paper studies different committees of neural networks for biometric pattern recognition. We use the neural nets as classifiers for identification and verification purposes. We show that a committee of nets can improve the recognition rates when compared with a multi-start initialization algo-rithm that just picks up the neural net which offers the best performance. On the other hand, we found that there is no strong correlation between identifi-cation and verification applications using the same classifier.  ( 2 min )
    Disability prediction in multiple sclerosis using performance outcome measures and demographic data. (arXiv:2204.03969v1 [cs.LG])
    Literature on machine learning for multiple sclerosis has primarily focused on the use of neuroimaging data such as magnetic resonance imaging and clinical laboratory tests for disease identification. However, studies have shown that these modalities are not consistent with disease activity such as symptoms or disease progression. Furthermore, the cost of collecting data from these modalities is high, leading to scarce evaluations. In this work, we used multi-dimensional, affordable, physical and smartphone-based performance outcome measures (POM) in conjunction with demographic data to predict multiple sclerosis disease progression. We performed a rigorous benchmarking exercise on two datasets and present results across 13 clinically actionable prediction endpoints and 6 machine learning models. To the best of our knowledge, our results are the first to show that it is possible to predict disease progression using POMs and demographic data in the context of both clinical trials and smartphone-base studies by using two datasets. Moreover, we investigate our models to understand the impact of different POMs and demographics on model performance through feature ablation studies. We also show that model performance is similar across different demographic subgroups (based on age and sex). To enable this work, we developed an end-to-end reusable pre-processing and machine learning framework which allows quicker experimentation over disparate MS datasets.  ( 2 min )
    KGI: An Integrated Framework for Knowledge Intensive Language Tasks. (arXiv:2204.03985v1 [cs.CL])
    In a recent work, we presented a novel state-of-the-art approach to zero-shot slot filling that extends dense passage retrieval with hard negatives and robust training procedures for retrieval augmented generation models. In this paper, we propose a system based on an enhanced version of this approach where we train task specific models for other knowledge intensive language tasks, such as open domain question answering (QA), dialogue and fact checking. Our system achieves results comparable to the best models in the KILT leaderboards. Moreover, given a user query, we show how the output from these different models can be combined to cross-examine each other. Particularly, we show how accuracy in dialogue can be improved using the QA model. A short video demonstrating the system is available here - \url{https://ibm.box.com/v/kgi-interactive-demo} .  ( 2 min )
    Controllable Missingness from Uncontrollable Missingness: Joint Learning Measurement Policy and Imputation. (arXiv:2204.03872v1 [cs.LG])
    Due to the cost or interference of measurement, we need to control measurement system. Assuming that each variable can be measured sequentially, there exists optimal policy choosing next measurement for the former observations. Though optimal measurement policy is actually dependent on the goal of measurement, we mainly focus on retrieving complete data, so called as imputation. Also, we adapt the imputation method to missingness varying with measurement policy. However, learning measurement policy and imputation requires complete data which is impossible to be observed, unfortunately. To tackle this problem, we propose a data generation method and joint learning algorithm. The main idea is that 1) the data generation method is inherited by imputation method, and 2) the adaptation of imputation encourages measurement policy to learn more than individual learning. We implemented some variations of proposed algorithm for two different datasets and various missing rates. From the experimental results, we demonstrate that our algorithm is generally applicable and outperforms baseline methods.  ( 2 min )
    CD$^2$-pFed: Cyclic Distillation-guided Channel Decoupling for Model Personalization in Federated Learning. (arXiv:2204.03880v1 [cs.CV])
    Federated learning (FL) is a distributed learning paradigm that enables multiple clients to collaboratively learn a shared global model. Despite the recent progress, it remains challenging to deal with heterogeneous data clients, as the discrepant data distributions usually prevent the global model from delivering good generalization ability on each participating client. In this paper, we propose CD^2-pFed, a novel Cyclic Distillation-guided Channel Decoupling framework, to personalize the global model in FL, under various settings of data heterogeneity. Different from previous works which establish layer-wise personalization to overcome the non-IID data across different clients, we make the first attempt at channel-wise assignment for model personalization, referred to as channel decoupling. To further facilitate the collaboration between private and shared weights, we propose a novel cyclic distillation scheme to impose a consistent regularization between the local and global model representations during the federation. Guided by the cyclical distillation, our channel decoupling framework can deliver more accurate and generalized results for different kinds of heterogeneity, such as feature skew, label distribution skew, and concept shift. Comprehensive experiments on four benchmarks, including natural image and medical image analysis tasks, demonstrate the consistent effectiveness of our method on both local and external validations.  ( 2 min )
    Disentangled Latent Speech Representation for Automatic Pathological Intelligibility Assessment. (arXiv:2204.04016v1 [eess.AS])
    Speech intelligibility assessment plays an important role in the therapy of patients suffering from pathological speech disorders. Automatic and objective measures are desirable to assist therapists in their traditionally subjective and labor-intensive assessments. In this work, we investigate a novel approach for obtaining such a measure using the divergence in disentangled latent speech representations of a parallel utterance pair, obtained from a healthy reference and a pathological speaker. Experiments on an English database of Cerebral Palsy patients, using all available utterances per speaker, show high and significant correlation values (R = -0.9) with subjective intelligibility measures, while having only minimal deviation (+-0.01) across four different reference speaker pairs. We also demonstrate the robustness of the proposed method (R = -0.89 deviating +-0.02 over 1000 iterations) by considering a significantly smaller amount of utterances per speaker. Our results are among the first to show that disentangled speech representations can be used for automatic pathological speech intelligibility assessment, resulting in a reference speaker pair invariant method, applicable in scenarios with only few utterances available.  ( 2 min )
    Exploring the Universality of Hadronic Jet Classification. (arXiv:2204.03812v1 [hep-ph])
    The modeling of jet substructure significantly differs between Parton Shower Monte Carlo (PSMC) programs. Despite this, we observe that machine learning classifiers trained on different PSMCs learn nearly the same function. This means that when these classifiers are applied to the same PSMC for testing, they result in nearly the same performance. This classifier universality indicates that a machine learning model trained on one simulation and tested on another simulation (or data) will likely be optimal. Our observations are based on detailed studies of shallow and deep neural networks applied to simulated Lorentz boosted Higgs jet tagging at the LHC.  ( 2 min )
    SuperNet in Neural Architecture Search: A Taxonomic Survey. (arXiv:2204.03916v1 [cs.CV])
    Deep Neural Networks (DNN) have made significant progress in a wide range of visual recognition tasks such as image classification, object detection, and semantic segmentation. The evolution of convolutional architectures has led to better performance by incurring expensive computational costs. In addition, network design has become a difficult task, which is labor-intensive and requires a high level of domain knowledge. To mitigate such issues, there have been studies for a variety of neural architecture search methods that automatically search for optimal architectures, achieving models with impressive performance that outperform human-designed counterparts. This survey aims to provide an overview of existing works in this field of research and specifically focus on the supernet optimization that builds a neural network that assembles all the architectures as its sub models by using weight sharing. We aim to accomplish that by categorizing supernet optimization by proposing them as solutions to the common challenges found in the literature: data-side optimization, poor rank correlation alleviation, and transferable NAS for a number of deployment scenarios.  ( 2 min )
    Data-Driven Evaluation of Training Action Space for Reinforcement Learning. (arXiv:2204.03840v1 [cs.LG])
    Training action space selection for reinforcement learning (RL) is conflict-prone due to complex state-action relationships. To address this challenge, this paper proposes a Shapley-inspired methodology for training action space categorization and ranking. To reduce exponential-time shapley computations, the methodology includes a Monte Carlo simulation to avoid unnecessary explorations. The effectiveness of the methodology is illustrated using a cloud infrastructure resource tuning case study. It reduces the search space by 80\% and categorizes the training action sets into dispensable and indispensable groups. Additionally, it ranks different training actions to facilitate high-performance yet cost-efficient RL model design. The proposed data-driven methodology is extensible to different domains, use cases, and reinforcement learning algorithms.  ( 2 min )
    Engagement Detection with Multi-Task Training in E-Learning Environments. (arXiv:2204.04020v1 [cs.CV])
    Recognition of user interaction, in particular engagement detection, became highly crucial for online working and learning environments, especially during the COVID-19 outbreak. Such recognition and detection systems significantly improve the user experience and efficiency by providing valuable feedback. In this paper, we propose a novel Engagement Detection with Multi-Task Training (ED-MTT) system which minimizes mean squared error and triplet loss together to determine the engagement level of students in an e-learning environment. The performance of this system is evaluated and compared against the state-of-the-art on a publicly available dataset as well as videos collected from real-life scenarios. The results show that ED-MTT achieves 6% lower MSE than the best state-of-the-art performance with highly acceptable training time and lightweight feature extraction.
    Blockchain as an Enabler for Transfer Learning in Smart Environments. (arXiv:2204.03959v1 [cs.AI])
    The knowledge, embodied in machine learning models for intelligent systems, is commonly associated with time-consuming and costly processes such as large-scale data collection, data labelling, network training, and fine-tuning of models. Sharing and reuse of these elaborated models between intelligent systems deployed in a different environment, which is known as transfer learning, would facilitate the adoption of services for the users and accelerates the uptake of intelligent systems in environments such as smart building and smart city applications. In this context, the communication and knowledge exchange between AI-enabled environments depend on a complicated networks of systems, system of systems, digital assets, and their chain of dependencies that hardly follows the centralized schema of traditional information systems. Rather, it requires an adaptive decentralized system architecture that is empowered by features such as data provenance, workflow transparency, and validation of process participants. In this research, we propose a decentralized and adaptive software framework based on blockchain and knowledge graph technologies that supports the knowledge exchange and interoperability between IoT-enabled environments, in a transparent and trustworthy way.  ( 2 min )
    Does the Market of Citations Reward Reproducible Work?. (arXiv:2204.03829v1 [cs.DL])
    The field of bibliometrics, studying citations and behavior, is critical to the discussion of reproducibility. Citations are one of the primary incentive and reward systems for academic work, and so we desire to know if this incentive rewards reproducible work. Yet to the best of our knowledge, only one work has attempted to look at this combined space, concluding that non-reproducible work is more highly cited. We show that answering this question is more challenging than first proposed, and subtle issues can inhibit a robust conclusion. To make inferences with more robust behavior, we propose a hierarchical Bayesian model that incorporates the citation rate over time, rather than the total number of citations after a fixed amount of time. In doing so we show that, under current evidence the answer is more likely that certain fields of study such as Medicine and Machine Learning (ML) do correlate reproducible works with more citations, but other fields appear to have no relationship. Further, we find that making code available and thoroughly referencing prior works appear to also positively correlate with increased citations. Our code and data can be found at https://github.com/EdwardRaff/ReproducibleCitations .  ( 2 min )
    A posteriori learning for quasi-geostrophic turbulence parametrization. (arXiv:2204.03911v1 [physics.flu-dyn])
    The use of machine learning to build subgrid parametrizations for climate models is receiving growing attention. State-of-the-art strategies address the problem as a supervised learning task and optimize algorithms that predict subgrid fluxes based on information from coarse resolution models. In practice, training data are generated from higher resolution numerical simulations transformed in order to mimic coarse resolution simulations. By essence, these strategies optimize subgrid parametrizations to meet so-called $\textit{a priori}$ criteria. But the actual purpose of a subgrid parametrization is to obtain good performance in terms of $\textit{a posteriori}$ metrics which imply computing entire model trajectories. In this paper, we focus on the representation of energy backscatter in two dimensional quasi-geostrophic turbulence and compare parametrizations obtained with different learning strategies at fixed computational complexity. We show that strategies based on $\textit{a priori}$ criteria yield parametrizations that tend to be unstable in direct simulations and describe how subgrid parametrizations can alternatively be trained end-to-end in order to meet $\textit{a posteriori}$ criteria. We illustrate that end-to-end learning strategies yield parametrizations that outperform known empirical and data-driven schemes in terms of performance, stability and ability to apply to different flow configurations. These results support the relevance of differentiable programming paradigms for climate models in the future.  ( 2 min )
    Federated Learning with Partial Model Personalization. (arXiv:2204.03809v1 [cs.LG])
    We consider two federated learning algorithms for training partially personalized models, where the shared and personal parameters are updated either simultaneously or alternately on the devices. Both algorithms have been proposed in the literature, but their convergence properties are not fully understood, especially for the alternating variant. We provide convergence analyses of both algorithms in the general nonconvex setting with partial participation and delineate the regime where one dominates the other. Our experiments on real-world image, text, and speech datasets demonstrate that (a) partial personalization can obtain most of the benefits of full model personalization with a small fraction of personal parameters, and, (b) the alternating update algorithm often outperforms the simultaneous update algorithm.  ( 2 min )
    Decomposition-based Generation Process for Instance-Dependent Partial Label Learning. (arXiv:2204.03845v1 [cs.LG])
    Partial label learning (PLL) is a typical weakly supervised learning problem, where each training example is associated with a set of candidate labels among which only one is true. Most existing PLL approaches assume that the incorrect labels in each training example are randomly picked as the candidate labels and model the generation process of the candidate labels in a simple way. However, these approaches usually do not perform as well as expected due to the fact that the generation process of the candidate labels is always instance-dependent. Therefore, it deserves to be modeled in a refined way. In this paper, we consider instance-dependent PLL and assume that the generation process of the candidate labels could decompose into two sequential parts, where the correct label emerges first in the mind of the annotator but then the incorrect labels related to the feature are also selected with the correct label as candidate labels due to uncertainty of labeling. Motivated by this consideration, we propose a novel PLL method that performs Maximum A Posterior(MAP) based on an explicitly modeled generation process of candidate labels via decomposed probability distribution models. Experiments on benchmark and real-world datasets validate the effectiveness of the proposed method.  ( 2 min )
    Q-learning with online random forests. (arXiv:2204.03771v1 [stat.ML])
    $Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.  ( 2 min )
    Quantum version of the k-NN classifier based on a quantum sorting algorithm. (arXiv:2204.03761v1 [quant-ph])
    In this work we introduce a quantum sorting algorithm with adaptable requirements of memory and circuit depth, and then use it to develop a new quantum version of the classical machine learning algorithm known as k-nearest neighbors (k-NN). Both the efficiency and performance of this new quantum version of the k-NN algorithm are compared to those of the classical k-NN and another quantum version proposed by Schuld et al. \cite{Int13}. Results show that the efficiency of both quantum algorithms is similar to each other and superior to that of the classical algorithm. On the other hand, the performance of our proposed quantum k-NN algorithm is superior to the one proposed by Schuld et al. and similar to that of the classical k-NN.  ( 2 min )
    Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition. (arXiv:2204.03793v1 [eess.AS])
    Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers. In this work, we present Personal VAD 2.0, a personalized voice activity detector that detects the voice activity of a target speaker, as part of a streaming on-device ASR system. Although previous proof-of-concept studies have validated the effectiveness of Personal VAD, there are still several critical challenges to address before this model can be used in production: first, the quality must be satisfactory in both enrollment and enrollment-less scenarios; second, it should operate in a streaming fashion; and finally, the model size should be small enough to fit a limited latency and CPU/Memory budget. To meet the multi-faceted requirements, we propose a series of novel designs: 1) advanced speaker embedding modulation methods; 2) a new training paradigm to generalize to enrollment-less conditions; 3) architecture and runtime optimizations for latency and resource restrictions. Extensive experiments on a realistic speech recognition system demonstrated the state-of-the-art performance of our proposed method.  ( 2 min )
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v1 [stat.ML])
    The evaluation of the free energy of a stochastic model is considered to be a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because it includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method, which is similar to a simulated annealing, and can effectively approximate the free energy. This study proposes a new AIS-based approach, referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on a theoretical and numerical perspectives. Based on the investigation, it has been proved that mAIS is more effective than AIS under a certain condition.  ( 2 min )
    A Learnable Variational Model for Joint Multimodal MRI Reconstruction and Synthesis. (arXiv:2204.03804v1 [eess.IV])
    Generating multi-contrasts/modal MRI of the same anatomy enriches diagnostic information but is limited in practice due to excessive data acquisition time. In this paper, we propose a novel deep-learning model for joint reconstruction and synthesis of multi-modal MRI using incomplete k-space data of several source modalities as inputs. The output of our model includes reconstructed images of the source modalities and high-quality image synthesized in the target modality. Our proposed model is formulated as a variational problem that leverages several learnable modality-specific feature extractors and a multimodal synthesis module. We propose a learnable optimization algorithm to solve this model, which induces a multi-phase network whose parameters can be trained using multi-modal MRI data. Moreover, a bilevel-optimization framework is employed for robust parameter training. We demonstrate the effectiveness of our approach using extensive numerical experiments.  ( 2 min )
    A survey on learning from imbalanced data streams: taxonomy, challenges, empirical study, and reproducible experimental framework. (arXiv:2204.03719v1 [cs.LG])
    Class imbalance poses new challenges when it comes to classifying data streams. Many algorithms recently proposed in the literature tackle this problem using a variety of data-level, algorithm-level, and ensemble approaches. However, there is a lack of standardized and agreed-upon procedures on how to evaluate these algorithms. This work presents a taxonomy of algorithms for imbalanced data streams and proposes a standardized, exhaustive, and informative experimental testbed to evaluate algorithms in a collection of diverse and challenging imbalanced data stream scenarios. The experimental study evaluates 24 state-of-the-art data streams algorithms on 515 imbalanced data streams that combine static and dynamic class imbalance ratios, instance-level difficulties, concept drift, real-world and semi-synthetic datasets in binary and multi-class scenarios. This leads to the largest experimental study conducted so far in the data stream mining domain. We discuss the advantages and disadvantages of state-of-the-art classifiers in each of these scenarios and we provide general recommendations to end-users for selecting the best algorithms for imbalanced data streams. Additionally, we formulate open challenges and future directions for this domain. Our experimental testbed is fully reproducible and easy to extend with new methods. This way we propose the first standardized approach to conducting experiments in imbalanced data streams that can be used by other researchers to create trustworthy and fair evaluation of newly proposed methods. Our experimental framework can be downloaded from https://github.com/canoalberto/imbalanced-streams.  ( 2 min )
    Decentralized Event-Triggered Federated Learning with Heterogeneous Communication Thresholds. (arXiv:2204.03726v1 [cs.LG])
    A recent emphasis of distributed learning research has been on federated learning (FL), in which model training is conducted by the data-collecting devices. Existing research on FL has mostly focused on a star topology learning architecture with synchronized (time-triggered) model training rounds, where the local models of the devices are periodically aggregated by a centralized coordinating node. However, in many settings, such a coordinating node may not exist, motivating efforts to fully decentralize FL. In this work, we propose a novel methodology for distributed model aggregations via asynchronous, event-triggered consensus iterations over the network graph topology. We consider heterogeneous communication event thresholds at each device that weigh the change in local model parameters against the available local resources in deciding the benefit of aggregations at each iteration. Through theoretical analysis, we demonstrate that our methodology achieves asymptotic convergence to the globally optimal learning model under standard assumptions in distributed learning and graph consensus literature, and without restrictive connectivity requirements on the underlying topology. Subsequent numerical results demonstrate that our methodology obtains substantial improvements in communication requirements compared with FL baselines.  ( 2 min )
    Global ECG Classification by Self-Operational Neural Networks with Feature Injection. (arXiv:2204.03768v1 [cs.LG])
    Objective: Global (inter-patient) ECG classification for arrhythmia detection over Electrocardiogram (ECG) signal is a challenging task for both humans and machines. The main reason is the significant variations of both normal and arrhythmic ECG patterns among patients. Automating this process with utmost accuracy is, therefore, highly desirable due to the advent of wearable ECG sensors. However, even with numerous deep learning approaches proposed recently, there is still a notable gap in the performance of global and patient-specific ECG classification performances. This study proposes a novel approach to narrow this gap and propose a real-time solution with shallow and compact 1D Self-Organized Operational Neural Networks (Self-ONNs). Methods: In this study, we propose a novel approach for inter-patient ECG classification using a compact 1D Self-ONN by exploiting morphological and timing information in heart cycles. We used 1D Self-ONN layers to automatically learn morphological representations from ECG data, enabling us to capture the shape of the ECG waveform around the R peaks. We further inject temporal features based on RR interval for timing characterization. The classification layers can thus benefit from both temporal and learned features for the final arrhythmia classification. Results: Using the MIT-BIH arrhythmia benchmark database, the proposed method achieves the highest classification performance ever achieved, i.e., 99.21% precision, 99.10% recall, and 99.15% F1-score for normal (N) segments; 82.19% precision, 82.50% recall, and 82.34% F1-score for the supra-ventricular ectopic beat (SVEBs); and finally, 94.41% precision, 96.10% recall, and 95.2% F1-score for the ventricular-ectopic beats (VEBs).  ( 2 min )
    Automated Design of Salient Object Detection Algorithms with Brain Programming. (arXiv:2204.03722v1 [cs.CV])
    Despite recent improvements in computer vision, artificial visual systems' design is still daunting since an explanation of visual computing algorithms remains elusive. Salient object detection is one problem that is still open due to the difficulty of understanding the brain's inner workings. Progress on this research area follows the traditional path of hand-made designs using neuroscience knowledge. In recent years two different approaches based on genetic programming appear to enhance their technique. One follows the idea of combining previous hand-made methods through genetic programming and fuzzy logic. The other approach consists of improving the inner computational structures of basic hand-made models through artificial evolution. This research work proposes expanding the artificial dorsal stream using a recent proposal to solve salient object detection problems. This approach uses the benefits of the two main aspects of this research area: fixation prediction and detection of salient objects. We decided to apply the fusion of visual saliency and image segmentation algorithms as a template. The proposed methodology discovers several critical structures in the template through artificial evolution. We present results on a benchmark designed by experts with outstanding results in comparison with the state-of-the-art.  ( 2 min )
    Compositional Generalization and Decomposition in Neural Program Synthesis. (arXiv:2204.03758v1 [cs.LG])
    When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what we can measure is whether they compositionally generalize, that is, whether a model that has been trained on the simpler subtasks is subsequently able to solve more complex tasks. In this paper, we focus on measuring the ability of learned program synthesizers to compositionally generalize. We first characterize several different axes along which program synthesis methods would be desired to generalize, e.g., length generalization, or the ability to combine known subroutines in new ways that do not occur in the training data. Based on this characterization, we introduce a benchmark suite of tasks to assess these abilities based on two popular existing datasets, SCAN and RobustFill. Finally, we make first attempts to improve the compositional generalization ability of Transformer models along these axes through novel attention mechanisms that draw inspiration from a human-like decomposition strategy. Empirically, we find our modified Transformer models generally perform better than natural baselines, but the tasks remain challenging.  ( 2 min )
    Physics-assisted Generative Adversarial Network for X-Ray Tomography. (arXiv:2204.03703v1 [eess.IV])
    X-ray tomography is capable of imaging the interior of objects in three dimensions non-invasively, with applications in biomedical imaging, materials science, electronic inspection, and other fields. The reconstruction process can be an ill-conditioned inverse problem, requiring regularization to obtain satisfactory reconstructions. Recently, deep learning has been adopted for tomographic reconstruction. Unlike iterative algorithms which require a distribution that is known a priori, deep reconstruction networks can learn a prior distribution through sampling the training distributions. In this work, we develop a Physics-assisted Generative Adversarial Network (PGAN), a two-step algorithm for tomographic reconstruction. In contrast to previous efforts, our PGAN utilizes maximum-likelihood estimates derived from the measurements to regularize the reconstruction with both known physics and the learned prior. Synthetic objects with spatial correlations are integrated circuits (IC) from a proposed model CircuitFaker. Compared with maximum-likelihood estimation, PGAN can reduce the photon requirement with limited projection angles to achieve a given error rate. We further attribute the improvement to the learned prior by reconstructing objects created without spatial correlations. The advantages of using a prior from deep learning in X-ray tomography may further enable low-photon nanoscale imaging.  ( 2 min )
    GreaseVision: Rewriting the Rules of the Interface. (arXiv:2204.03731v1 [cs.HC])
    Digital harms can manifest across any interface. Key problems in addressing these harms include the high individuality of harms and the fast-changing nature of digital systems. As a result, we still lack a systematic approach to study harms and produce interventions for end-users. We put forward GreaseVision, a new framework that enables end-users to collaboratively develop interventions against harms in software using a no-code approach and recent advances in few-shot machine learning. The contribution of the framework and tool allow individual end-users to study their usage history and create personalized interventions. Our contribution also enables researchers to study the distribution of harms and interventions at scale.  ( 2 min )
    Mixing Signals: Data Augmentation Approach for Deep Learning Based Modulation Recognition. (arXiv:2204.03737v1 [eess.SP])
    With the rapid development of deep learning, automatic modulation recognition (AMR), as an important task in cognitive radio, has gradually transformed from traditional feature extraction and classification to automatic classification by deep learning technology. However, deep learning models are data-driven methods, which often require a large amount of data as the training support. Data augmentation, as the strategy of expanding dataset, can improve the generalization of the deep learning models and thus improve the accuracy of the models to a certain extent. In this paper, for AMR of radio signals, we propose a data augmentation strategy based on mixing signals and consider four specific methods (Random Mixing, Maximum-Similarity-Mixing, $\theta-$Similarity Mixing and n-times Random Mixing) to achieve data augmentation. Experiments show that our proposed method can improve the classification accuracy of deep learning based AMR models in the full public dataset RML2016.10a. In particular, for the case of a single signal-to-noise ratio signal set, the classification accuracy can be significantly improved, which verifies the effectiveness of the methods.  ( 2 min )
    A Kernel Method to Nonlinear Location Estimation with RSS-based Fingerprint. (arXiv:2204.03724v1 [cs.NI])
    This paper presents a nonlinear location estimation to infer the position of a user holding a smartphone. We consider a large location with $M$ number of grid points, each grid point is labeled with a unique fingerprint consisting of the received signal strength (RSS) values measured from $N$ number of Bluetooth Low Energy (BLE) beacons. Given the fingerprint observed by the smartphone, the user's current location can be estimated by finding the top-k similar fingerprints from the list of fingerprints registered in the database. Besides the environmental factors, the dynamicity in holding the smartphone is another source to the variation in fingerprint measurements, yet there are not many studies addressing the fingerprint variability due to dynamic smartphone positions held by human hands during online detection. To this end, we propose a nonlinear location estimation using the kernel method. Specifically, our proposed method comprises of two steps: 1) a beacon selection strategy to select a subset of beacons that is insensitive to the subtle change of holding positions, and 2) a kernel method to compute the similarity between this subset of observed signals and all the fingerprints registered in the database. The experimental results based on large-scale data collected in a complex building indicate a substantial performance gain of our proposed approach in comparison to state-of-the-art methods. The dataset consisting of the signal information collected from the beacons is available online.  ( 2 min )
    Introducing a Framework and a Decision Protocol to Calibrate Recommender Systems. (arXiv:2204.03706v1 [cs.IR])
    Recommender Systems use the user's profile to generate a recommendation list with unknown items to a target user. Although the primary goal of traditional recommendation systems is to deliver the most relevant items, such an effort unintentionally can cause collateral effects including low diversity and unbalanced genres or categories, benefiting particular groups of categories. This paper proposes an approach to create recommendation lists with a calibrated balance of genres, avoiding disproportion between the user's profile interests and the recommendation list. The calibrated recommendations consider concomitantly the relevance and the divergence between the genres distributions extracted from the user's preference and the recommendation list. The main claim is that calibration can contribute positively to generate fairer recommendations. In particular, we propose a new trade-off equation, which considers the users' bias to provide a recommendation list that seeks for the users' tendencies. Moreover, we propose a conceptual framework and a decision protocol to generate more than one thousand combinations of calibrated systems in order to find the best combination. We compare our approach against state-of-the-art approaches using multiple domain datasets, which are analyzed by rank and calibration metrics. The results indicate that the trade-off, which considers the users' bias, produces positive effects on the precision and to the fairness, thus generating recommendation lists that respect the genre distribution and, through the decision protocol, we also found the best system for each dataset.  ( 2 min )
    TemporalUV: Capturing Loose Clothing with Temporally Coherent UV Coordinates. (arXiv:2204.03671v1 [cs.CV])
    We propose a novel approach to generate temporally coherent UV coordinates for loose clothing. Our method is not constrained by human body outlines and can capture loose garments and hair. We implemented a differentiable pipeline to learn UV mapping between a sequence of RGB inputs and textures via UV coordinates. Instead of treating the UV coordinates of each frame separately, our data generation approach connects all UV coordinates via feature matching for temporal stability. Subsequently, a generative model is trained to balance the spatial quality and temporal stability. It is driven by supervised and unsupervised losses in both UV and image spaces. Our experiments show that the trained models output high-quality UV coordinates and generalize to new poses. Once a sequence of UV coordinates has been inferred by our model, it can be used to flexibly synthesize new looks and modified visual styles. Compared to existing methods, our approach reduces the computational workload to animate new outfits by several orders of magnitude.  ( 2 min )
    T4PdM: a Deep Neural Network based on the Transformer Architecture for Fault Diagnosis of Rotating Machinery. (arXiv:2204.03725v1 [cs.AI])
    Deep learning and big data algorithms have become widely used in industrial applications to optimize several tasks in many complex systems. Particularly, deep learning model for diagnosing and prognosing machinery health has leveraged predictive maintenance (PdM) to be more accurate and reliable in decision making, in this way avoiding unnecessary interventions, machinery accidents, and environment catastrophes. Recently, Transformer Neural Networks have gained notoriety and have been increasingly the favorite choice for Natural Language Processing (NLP) tasks. Thus, given their recent major achievements in NLP, this paper proposes the development of an automatic fault classifier model for predictive maintenance based on a modified version of the Transformer architecture, namely T4PdM, to identify multiple types of faults in rotating machinery. Experimental results are developed and presented for the MaFaulDa and CWRU databases. T4PdM was able to achieve an overall accuracy of 99.98% and 98% for both datasets, respectively. In addition, the performance of the proposed model is compared to other previously published works. It has demonstrated the superiority of the model in detecting and classifying faults in rotating industrial machinery. Therefore, the proposed Transformer-based model can improve the performance of machinery fault analysis and diagnostic processes and leverage companies to a new era of the Industry 4.0. In addition, this methodology can be adapted to any other task of time series classification.  ( 2 min )
    Brain-Inspired Hyperdimensional Computing: How Thermal-Friendly for Edge Computing?. (arXiv:2204.03739v1 [cs.ET])
    Brain-inspired hyperdimensional computing (HDC) is an emerging machine learning (ML) methods. It is based on large vectors of binary or bipolar symbols and a few simple mathematical operations. The promise of HDC is a highly efficient implementation for embedded systems like wearables. While fast implementations have been presented, other constraints have not been considered for edge computing. In this work, we aim at answering how thermal-friendly HDC for edge computing is. Devices like smartwatches, smart glasses, or even mobile systems have a restrictive cooling budget due to their limited volume. Although HDC operations are simple, the vectors are large, resulting in a high number of CPU operations and thus a heavy load on the entire system potentially causing temperature violations. In this work, the impact of HDC on the chip's temperature is investigated for the first time. We measure the temperature and power consumption of a commercial embedded system and compare HDC with conventional CNN. We reveal that HDC causes up to 6.8{\deg}C higher temperatures and leads to up to 47% more CPU throttling. Even when both HDC and CNN aim for the same throughput (i.e., perform a similar number of classifications per second), HDC still causes higher on-chip temperatures due to the larger power consumption.  ( 2 min )
    Using Multiple Self-Supervised Tasks Improves Model Robustness. (arXiv:2204.03714v1 [cs.CV])
    Deep networks achieve state-of-the-art performance on computer vision tasks, yet they fail under adversarial attacks that are imperceptible to humans. In this paper, we propose a novel defense that can dynamically adapt the input using the intrinsic structure from multiple self-supervised tasks. By simultaneously using many self-supervised tasks, our defense avoids over-fitting the adapted image to one specific self-supervised task and restores more intrinsic structure in the image compared to a single self-supervised task approach. Our approach further improves robustness and clean accuracy significantly compared to the state-of-the-art single task self-supervised defense. Our work is the first to connect multiple self-supervised tasks to robustness, and suggests that we can achieve better robustness with more intrinsic signal from visual data.  ( 2 min )
    BankNote-Net: Open dataset for assistive universal currency recognition. (arXiv:2204.03738v1 [cs.CV])
    Millions of people around the world have low or no vision. Assistive software applications have been developed for a variety of day-to-day tasks, including optical character recognition, scene identification, person recognition, and currency recognition. This last task, the recognition of banknotes from different denominations, has been addressed by the use of computer vision models for image recognition. However, the datasets and models available for this task are limited, both in terms of dataset size and in variety of currencies covered. In this work, we collect a total of 24,826 images of banknotes in variety of assistive settings, spanning 17 currencies and 112 denominations. Using supervised contrastive learning, we develop a machine learning model for universal currency recognition. This model learns compliant embeddings of banknote images in a variety of contexts, which can be shared publicly (as a compressed vector representation), and can be used to train and test specialized downstream models for any currency, including those not covered by our dataset or for which only a few real images per denomination are available (few-shot learning). We deploy a variation of this model for public use in the last version of the Seeing AI app developed by Microsoft. We share our encoder model and the embeddings as an open dataset in our BankNote-Net repository.  ( 2 min )
    Learning to Walk Autonomously via Reset-Free Quality-Diversity. (arXiv:2204.03655v1 [cs.LG])
    Quality-Diversity (QD) algorithms can discover large and complex behavioural repertoires consisting of both diverse and high-performing skills. However, the generation of behavioural repertoires has mainly been limited to simulation environments instead of real-world learning. This is because existing QD algorithms need large numbers of evaluations as well as episodic resets, which require manual human supervision and interventions. This paper proposes Reset-Free Quality-Diversity optimization (RF-QD) as a step towards autonomous learning for robotics in open-ended environments. We build on Dynamics-Aware Quality-Diversity (DA-QD) and introduce a behaviour selection policy that leverages the diversity of the imagined repertoire and environmental information to intelligently select of behaviours that can act as automatic resets. We demonstrate this through a task of learning to walk within defined training zones with obstacles. Our experiments show that we can learn full repertoires of legged locomotion controllers autonomously without manual resets with high sample efficiency in spite of harsh safety constraints. Finally, using an ablation of different target objectives, we show that it is important for RF-QD to have diverse types solutions available for the behaviour selection policy over solutions optimised with a specific objective. Videos and code available at https://sites.google.com/view/rf-qd.  ( 2 min )
    Identification of Autism spectrum disorder based on a novel feature selection method and Variational Autoencoder. (arXiv:2204.03654v1 [eess.IV])
    The development of noninvasive brain imaging such as resting-state functional magnetic resonance imaging (rs-fMRI) and its combination with AI algorithm provides a promising solution for the early diagnosis of Autism spectrum disorder (ASD). However, the performance of the current ASD classification based on rs-fMRI still needs to be improved. This paper introduces a classification framework to aid ASD diagnosis based on rs-fMRI. In the framework, we proposed a novel filter feature selection method based on the difference between step distribution curves (DSDC) to select remarkable functional connectivities (FCs) and utilized a multilayer perceptron (MLP) which was pretrained by a simplified Variational Autoencoder (VAE) for classification. We also designed a pipeline consisting of a normalization procedure and a modified hyperbolic tangent (tanh) activation function to replace the original tanh function, further improving the model accuracy. Our model was evaluated by 10 times 10-fold cross-validation and achieved an average accuracy of 78.12%, outperforming the state-of-the-art methods reported on the same dataset. Given the importance of sensitivity and specificity in disease diagnosis, two constraints were designed in our model which can improve the model's sensitivity and specificity by up to 9.32% and 10.21%, respectively. The added constraints allow our model to handle different application scenarios and can be used broadly.  ( 2 min )
    Qade: Solving Differential Equations on Quantum Annealers. (arXiv:2204.03657v1 [quant-ph])
    We present a general method, called Qade, for solving differential equations using a quantum annealer. The solution is obtained as a linear combination of a set of basis functions. On current devices, Qade can solve systems of coupled partial differential equations that depend linearly on the solution and its derivatives, with non-linear variable coefficients and arbitrary inhomogeneous terms. We test the method with several examples and find that state-of-the-art quantum annealers can find the solution accurately for problems requiring a small enough function basis. We provide a Python package implementing the method at gitlab.com/jccriado/qade.  ( 2 min )
    Adaptive-Gravity: A Defense Against Adversarial Samples. (arXiv:2204.03694v1 [cs.LG])
    This paper presents a novel model training solution, denoted as Adaptive-Gravity, for enhancing the robustness of deep neural network classifiers against adversarial examples. We conceptualize the model parameters/features associated with each class as a mass characterized by its centroid location and the spread (standard deviation of the distance) of features around the centroid. We use the centroid associated with each cluster to derive an anti-gravity force that pushes the centroids of different classes away from one another during network training. Then we customized an objective function that aims to concentrate each class's features toward their corresponding new centroid, which has been obtained by anti-gravity force. This methodology results in a larger separation between different masses and reduces the spread of features around each centroid. As a result, the samples are pushed away from the space that adversarial examples could be mapped to, effectively increasing the degree of perturbation needed for making an adversarial example. We have implemented this training solution as an iterative method consisting of four steps at each iteration: 1) centroid extraction, 2) anti-gravity force calculation, 3) centroid relocation, and 4) gravity training. Gravity's efficiency is evaluated by measuring the corresponding fooling rates against various attack models, including FGSM, MIM, BIM, and PGD using LeNet and ResNet110 networks, benchmarked against MNIST and CIFAR10 classification problems. Test results show that Gravity not only functions as a powerful instrument to robustify a model against state-of-the-art adversarial attacks but also effectively improves the model training accuracy.  ( 2 min )
  • Open

    MINIMALIST: Mutual INformatIon Maximization for Amortized Likelihood Inference from Sampled Trajectories. (arXiv:2106.01808v3 [cs.LG] UPDATED)
    Simulation-based inference enables learning the parameters of a model even when its likelihood cannot be computed in practice. One class of methods uses data simulated with different parameters to infer models of the likelihood-to-evidence ratio, or equivalently the posterior function. Here we frame the inference task as an estimation of an energy function parametrized with an artificial neural network. We present an intuitive approach where the optimal model of the likelihood-to-evidence ratio is found by maximizing the likelihood of simulated data. Within this framework, the connection between the task of simulation-based inference and mutual information maximization is clear, and we show how several known methods of posterior estimation relate to alternative lower bounds to mutual information. These distinct objective functions aim at the same optimal energy form and therefore can be directly benchmarked. We compare their accuracy in the inference of model parameters, focusing on four dynamical systems that encompass common challenges in time series analysis: dynamics driven by multiplicative noise, nonlinear interactions, chaotic behavior, and high-dimensional parameter space.  ( 2 min )
    Overlapping Spaces for Compact Graph Representations. (arXiv:2007.02445v3 [cs.LG] UPDATED)
    Various non-trivial spaces are becoming popular for embedding structured data such as graphs, texts, or images. Following spherical and hyperbolic spaces, more general product spaces have been proposed. However, searching for the best configuration of product space is a resource-intensive procedure, which reduces the practical applicability of the idea. We generalize the concept of product space and introduce an overlapping space that does not have the configuration search problem. The main idea is to allow subsets of coordinates to be shared between spaces of different types (Euclidean, hyperbolic, spherical). As a result, parameter optimization automatically learns the optimal configuration. Additionally, overlapping spaces allow for more compact representations since their geometry is more complex. Our experiments confirm that overlapping spaces outperform the competitors in graph embedding tasks. Here, we consider both distortion setup, where the aim is to preserve distances, and ranking setup, where the relative order should be preserved. The proposed method effectively solves the problem and outperforms the competitors in both settings. We also perform an empirical analysis in a realistic information retrieval task, where we compare all spaces by incorporating them into DSSM. In this case, the proposed overlapping space consistently achieves nearly optimal results without any configuration tuning. This allows for reducing training time, which can be significant in large-scale applications.  ( 2 min )
    Identifiability of Label Noise Transition Matrix. (arXiv:2202.02016v2 [cs.LG] UPDATED)
    The noise transition matrix plays a central role in the problem of learning from noisy labels. Among many other reasons, a significant number of existing solutions rely on access to it. Estimating the transition matrix without using ground truth labels is a critical and challenging task. When label noise transition depends on each instance, the problem of identifying the instance-dependent noise transition matrix becomes substantially more challenging. Despite recent works proposing solutions for learning from instance-dependent noisy labels, we lack a unified understanding of when such a problem remains identifiable, and therefore learnable. This paper seeks to provide answers to a sequence of related questions: What are the primary factors that contribute to the identifiability of a noise transition matrix? Can we explain the observed empirical successes? When a problem is not identifiable, what can we do to make it so? We will relate our theoretical findings to the literature and hope to provide guidelines for developing effective solutions for battling instance-dependent label noise.  ( 2 min )
    Active Linear Regression for $\ell_p$ Norms and Beyond. (arXiv:2111.04888v3 [cs.LG] UPDATED)
    We study active sampling algorithms for linear regression, which aim to query only a few entries of a target vector $b\in\mathbb R^n$ and output a near minimizer to $\min_{x\in\mathbb R^d} \|Ax-b\|$, for a design matrix $A\in\mathbb R^{n \times d}$ and loss $\|\cdot\|$. For $p$ norm regression for any $0<p<\infty$, we give an algorithm based on Lewis weight sampling outputting a $(1+\epsilon)$-approximate solution using just $\tilde O(d/\epsilon^2)$ queries to $b$ for $p\in(0,1)$, $\tilde{O}(d/\epsilon)$ queries for $1<p<2$, and $\tilde{O}(d^{p/2}/\epsilon^p)$ queries for $2<p<\infty$. For $0<p<2$, our bounds are optimal up to log factors, settling the query complexity for this range. For $2<p<\infty$, our dependence on $d$ is optimal, while our dependence on $\epsilon$ is off by at most $\epsilon$, up to log factors. Our result resolves an open question of [CD21], who gave near optimal bounds for the $1$ norm, but required $d^2/\epsilon^2$ samples for $\ell_p$ regression with $1<p<2$, and gave no bounds for $2<p<\infty$ or $0<p<1$. We also give the first total sensitivity bound of $O(d^{\max\{1,p/2\}}\log^2n)$ for loss functions of degree $p$ polynomial growth, improving a result of [TMF20]. By combining this with our techniques for $\ell_p$ regression, we obtain an active regression algorithm making $\tilde O(d^{1+\max\{1,p/2\}}/\mathrm{poly}(\epsilon))$ queries for such loss functions, including the Tukey and Huber losses, answering another question of [CD21]. For the Huber loss, we further improve our bound to $\tilde O(d^{4-2\sqrt2}/\mathrm{poly}(\epsilon))$ samples. Our sensitivity bounds also have many applications, including Orlicz norm subspace embeddings, robust subspace approximation, and dimension reduction for smoothed $p$-norms. Finally, our active sampling results give the first sublinear time algorithms for Kronecker product regression under every $p$ norm.  ( 3 min )
    On the Convergence of Stochastic Extragradient for Bilinear Games using Restarted Iteration Averaging. (arXiv:2107.00464v4 [math.OC] UPDATED)
    We study the stochastic bilinear minimax optimization problem, presenting an analysis of the same-sample Stochastic ExtraGradient (SEG) method with constant step size, and presenting variations of the method that yield favorable convergence. In sharp contrasts with the basic SEG method whose last iterate only contracts to a fixed neighborhood of the Nash equilibrium, SEG augmented with iteration averaging provably converges to the Nash equilibrium under the same standard settings, and such a rate is further improved by incorporating a scheduled restarting procedure. In the interpolation setting where noise vanishes at the Nash equilibrium, we achieve an optimal convergence rate up to tight constants. We present numerical experiments that validate our theoretical findings and demonstrate the effectiveness of the SEG method when equipped with iteration averaging and restarting.  ( 2 min )
    Covariance-Free Sparse Bayesian Learning. (arXiv:2105.10439v2 [eess.SP] UPDATED)
    Sparse Bayesian learning (SBL) is a powerful framework for tackling the sparse coding problem while also providing uncertainty quantification. The most popular inference algorithms for SBL exhibit prohibitively large computational costs for high-dimensional problems due to the need to maintain a large covariance matrix. To resolve this issue, we introduce a new method for accelerating SBL inference -- named covariance-free expectation maximization (CoFEM) -- that avoids explicit computation of the covariance matrix. CoFEM solves multiple linear systems to obtain unbiased estimates of the posterior statistics needed by SBL. This is accomplished by exploiting innovations from numerical linear algebra such as preconditioned conjugate gradient and a little-known diagonal estimation rule. For a large class of compressed sensing matrices, we provide theoretical justifications for why our method scales well in high-dimensional settings. Through simulations, we show that CoFEM can be up to thousands of times faster than existing baselines without sacrificing coding accuracy. Through applications to calcium imaging deconvolution and multi-contrast MRI reconstruction, we show that CoFEM enables SBL to tractably tackle high-dimensional sparse coding problems of practical interest.  ( 2 min )
    Q-learning with online random forests. (arXiv:2204.03771v1 [stat.ML])
    $Q$-learning is the most fundamental model-free reinforcement learning algorithm. Deployment of $Q$-learning requires approximation of the state-action value function (also known as the $Q$-function). In this work, we provide online random forests as $Q$-function approximators and propose a novel method wherein the random forest is grown as learning proceeds (through expanding forests). We demonstrate improved performance of our methods over state-of-the-art Deep $Q$-Networks in two OpenAI gyms (`blackjack' and `inverted pendulum') but not in the `lunar lander' gym. We suspect that the resilience to overfitting enjoyed by random forests recommends our method for common tasks that do not require a strong representation of the problem domain. We show that expanding forests (in which the number of trees increases as data comes in) improve performance, suggesting that expanding forests are viable for other applications of online random forests beyond the reinforcement learning setting.  ( 2 min )
    Pretext Tasks selection for multitask self-supervised speech representation learning. (arXiv:2107.00594v4 [eess.AS] UPDATED)
    Through solving pretext tasks, self-supervised learning leverages unlabeled data to extract useful latent representations replacing traditional input features in the downstream task. In audio/speech signal processing, a wide range of features where engineered through decades of research efforts. As it turns out, learning to predict such features (a.k.a pseudo-labels) has proven to be a particularly relevant pretext task, leading to useful self-supervised representations which prove to be effective for downstream tasks. However, methods and common practices for combining such pretext tasks for better performance on the downstream task have not been explored and understood properly. In fact, the process relies almost exclusively on a computationally heavy experimental procedure, which becomes intractable with the increase of the number of pretext tasks. This paper introduces a method to select a group of pretext tasks among a set of candidates. The method we propose estimates calibrated weights for the partial losses corresponding to the considered pretext tasks during the self-supervised training process. The experiments conducted on automatic speech recognition, speaker and emotion recognition validate our approach, as the groups selected and weighted with our method perform better than classic baselines, thus facilitating the selection and combination of relevant pseudo-labels for self-supervised representation learning.  ( 2 min )
    Learning Polynomial Transformations. (arXiv:2204.04209v1 [cs.LG])
    We consider the problem of learning high dimensional polynomial transformations of Gaussians. Given samples of the form $p(x)$, where $x\sim N(0, \mathrm{Id}_r)$ is hidden and $p: \mathbb{R}^r \to \mathbb{R}^d$ is a function where every output coordinate is a low-degree polynomial, the goal is to learn the distribution over $p(x)$. This problem is natural in its own right, but is also an important special case of learning deep generative models, namely pushforwards of Gaussians under two-layer neural networks with polynomial activations. Understanding the learnability of such generative models is crucial to understanding why they perform so well in practice. Our first main result is a polynomial-time algorithm for learning quadratic transformations of Gaussians in a smoothed setting. Our second main result is a polynomial-time algorithm for learning constant-degree polynomial transformations of Gaussian in a smoothed setting, when the rank of the associated tensors is small. In fact our results extend to any rotation-invariant input distribution, not just Gaussian. These are the first end-to-end guarantees for learning a pushforward under a neural network with more than one layer. Along the way, we also give the first polynomial-time algorithms with provable guarantees for tensor ring decomposition, a popular generalization of tensor decomposition that is used in practice to implicitly store large tensors.  ( 2 min )
    Two-stage Training of Graph Neural Networks for Graph Classification. (arXiv:2011.05097v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have received massive attention in the field of machine learning on graphs. Inspired by the success of neural networks, a line of research has been conducted to train GNNs to deal with various tasks, such as node classification, graph classification, and link prediction. In this work, our task of interest is graph classification. Several GNN models have been proposed and shown great accuracy in this task. However, the question is whether usual training methods fully realize the capacity of the GNN models. In this work, we propose a two-stage training framework based on triplet loss. In the first stage, GNN is trained to map each graph to a Euclidean-space vector so that graphs of the same class are close while those of different classes are mapped far apart. Once graphs are well-separated based on labels, a classifier is trained to distinguish between different classes. This method is generic in the sense that it is compatible with any GNN model. By adapting five GNN models to our method, we demonstrate the consistent improvement in accuracy and utilization of each GNN's allocated capacity over the original training method of each model up to 5.4\% points in 12 datasets.  ( 2 min )
    Sample Complexity versus Depth: An Information Theoretic Analysis. (arXiv:2203.00246v3 [cs.LG] UPDATED)
    Deep learning has proven effective across a range of data sets. In light of this, a natural inquiry is: "for what data generating processes can deep learning succeed?" In this work, we study the sample complexity of learning multilayer data generating processes of a sort for which deep neural networks seem to be suited. We develop general and elegant information-theoretic tools that accommodate analysis of any data generating process -- shallow or deep, parametric or nonparametric, noiseless or noisy. We then use these tools to characterize the dependence of sample complexity on the depth of multilayer processes. Our results indicate roughly linear dependence on depth. This is in contrast to previous results that suggest exponential or high-order polynomial dependence.  ( 2 min )
    TF-Coder: Program Synthesis for Tensor Manipulations. (arXiv:2003.09040v4 [cs.PL] UPDATED)
    The success and popularity of deep learning is on the rise, partially due to powerful deep learning frameworks such as TensorFlow and PyTorch that make it easier to develop deep learning models. However, these libraries also come with steep learning curves, since programming in these frameworks is quite different from traditional imperative programming with explicit loops and conditionals. In this work, we present a tool called TF-Coder for programming by example in TensorFlow. TF-Coder uses a bottom-up weighted enumerative search, with value-based pruning of equivalent expressions and flexible type- and value-based filtering to ensure that expressions adhere to various requirements imposed by the TensorFlow library. We train models to predict TensorFlow operations from features of the input and output tensors and natural language descriptions of tasks, to prioritize relevant operations during search. TF-Coder solves 63 of 70 real-world tasks within 5 minutes, sometimes finding simpler solutions in less time compared to experienced human programmers.  ( 2 min )
    Neural network training under semidefinite constraints. (arXiv:2201.00632v2 [cs.LG] UPDATED)
    This paper is concerned with the training of neural networks (NNs) under semidefinite constraints, which allows for NN training with robustness and stability guarantees. In particular, we set up an efficient and scalable training scheme for NN training problems of this kind based on interior point methods, while we also exploit the structure of the underlying matrix constraint. We apply our training scheme to several relevant examples that have been studied in the literature and newly present the application of the method to the training of Wasserstein generative adversarial networks (WGANs). In numerical examples, we show the superiority of our method and its applicability to WGAN training.  ( 2 min )
    Trading off Accuracy for Speedup: Multiplier Bootstraps for Subgraph Counts. (arXiv:2009.06170v5 [stat.ME] UPDATED)
    We propose a new class of multiplier bootstraps for count functionals, ranging from a fast, approximate linear bootstrap tailored to sparse, massive graphs to a quadratic bootstrap procedure that offers refined accuracy for smaller, denser graphs. For the fast, approximate linear bootstrap, we show that $\sqrt{n}$-consistent inference of the count functional is attainable in certain computational regimes that depend on the sparsity level of the graph. Furthermore, even in more challenging regimes, we prove that our bootstrap procedure offers valid coverage and vanishing confidence intervals. For the quadratic bootstrap, we establish an Edgeworth expansion and show that this procedure offers higher-order accuracy under appropriate sparsity conditions. We complement our theoretical results with a simulation study and real data analysis and verify that our procedure offers state-of-the-art performance for several functionals.  ( 2 min )
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v1 [stat.ML])
    The evaluation of the free energy of a stochastic model is considered to be a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because it includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method, which is similar to a simulated annealing, and can effectively approximate the free energy. This study proposes a new AIS-based approach, referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on a theoretical and numerical perspectives. Based on the investigation, it has been proved that mAIS is more effective than AIS under a certain condition.  ( 2 min )
    Handling highly correlated genes in prediction analysis of genomic studies. (arXiv:2007.02455v4 [stat.AP] UPDATED)
    Background: Selecting feature genes to predict phenotypes is one of the typical tasks in analyzing genomics data. Though many general-purpose algorithms were developed for prediction, dealing with highly correlated genes in the prediction model is still not well addressed. High correlation among genes introduces technical problems, such as multi-collinearity issues, leading to unreliable prediction models. Furthermore, when a causal gene (whose variants have an actual biological effect on a phenotype) is highly correlated with other genes, most algorithms select the feature gene from the correlated group in a purely data-driven manner. Since the correlation structure among genes could change substantially when condition changes, the prediction model based on not correctly selected feature genes is unreliable. Therefore, we aim to keep the causal biological signal in the prediction process and build a more robust prediction model. Method: We propose a grouping algorithm, which treats highly correlated genes as a group and uses their common pattern to represent the group's biological signal in feature selection. Our novel grouping algorithm can be integrated into existing prediction algorithms to enhance their prediction performance. Our proposed grouping method has two advantages. First, using the gene group's common patterns makes the prediction more robust and reliable under condition change. Second, it reports whole correlated gene groups as discovered biomarkers for prediction tasks, allowing researchers to conduct follow-up studies to identify causal genes within the identified groups. Result: Using real benchmark scRNA-seq datasets with simulated cell phenotypes, we demonstrate our novel method significantly outperforms standard models in both (1) prediction of cell phenotypes and (2) feature gene selection.  ( 2 min )
    Seeded graph matching for the correlated Wigner model via the projected power method. (arXiv:2204.04099v1 [math.ST])
    In the graph matching problem we observe two graphs $G,H$ and the goal is to find an assignment (or matching) between their vertices such that some measure of edge agreement is maximized. We assume in this work that the observed pair $G,H$ has been drawn from the correlated Wigner model -- a popular model for correlated weighted graphs -- where the entries of the adjacency matrices of $G$ and $H$ are independent Gaussians and each edge of $G$ is correlated with one edge of $H$ (determined by the unknown matching) with the edge correlation described by a parameter $\sigma\in [0,1)$. In this paper, we analyse the performance of the projected power method (PPM) as a seeded graph matching algorithm where we are given an initial partially correct matching (called the seed) as side information. We prove that if the seed is close enough to the ground-truth matching, then with high probability, PPM iteratively improves the seed and recovers the ground-truth matching (either partially or exactly) in $\mathcal{O}(\log n)$ iterations. Our results prove that PPM works even in regimes of constant $\sigma$, thus extending the analysis in (Mao et al.,2021) for the sparse Erd\"os-Renyi model to the (dense) Wigner model. As a byproduct of our analysis, we see that the PPM framework generalizes some of the state-of-art algorithms for seeded graph matching. We support and complement our theoretical findings with numerical experiments on synthetic data.  ( 2 min )

  • Open

    Trippy AI Dream 30 - Howl's Moving Castle Post-Apocalyptic War Scenes VQ...
    submitted by /u/LordPewPew777 [link] [comments]
    Is there a AI which can turn images into simple versions of the original image?
    So that the wrinkles and shadows are removed, etc. submitted by /u/xXNOdrugsForMEXx [link] [comments]
    The Singularity is Now
    submitted by /u/ManandMultiverse [link] [comments]
    AI Graphics: Design your dream body with a slider
    submitted by /u/much_successes [link] [comments]  ( 1 min )
    Does AI exist that takes an image of a real person and edit/generate photos of them?
    submitted by /u/NootropicLove [link] [comments]
  • Open

    Circular slide rule
    I explained the basics of how a slide rule works in the previous post. But how does a circular slide rule work? Apparently the prop Mr. Spock is holding is an E6B aircraft slide rule. It includes a circular slide rule and more functionality. Start with an ordinary straight slide rule, with each bar labeled […] Circular slide rule first appeared on John D. Cook.  ( 2 min )
    Why a slide rule works
    Suppose you have two sticks. The length of one is log x, and the length of the other is log y. If you put the two sticks end to end, the combined length is log x + log y = log xy. That’s the basic idea behind a slide rule. The simplest slide rule consists […] Why a slide rule works first appeared on John D. Cook.  ( 2 min )
  • Open

    [D] Machine Learning - WAYR (What Are You Reading) - Week 135
    This is a place to share machine learning research papers, journals, and articles that you're reading this week. If it relates to what you're researching, by all means elaborate and give us your insight, otherwise it could just be an interesting paper you've read. Please try to provide some insight from your understanding and please don't post things which are present in wiki. Preferably you should link the arxiv page (not the PDF, you can easily access the PDF from the summary page but not the other way around) or any other pertinent links. Previous weeks : 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100 101-110 111-120 121-130 131-140 Week 1 Week 11 Week 21 Week 31 Week 41 Week 51 Week 61 Week 71 Week 81 Week 91 Week 101 Week 111 Week 121 Week 131 Week 2 Week 1…  ( 1 min )
    [N]: Dall-E 2 Explained
    submitted by /u/giugiacaglia [link] [comments]  ( 1 min )
    [R] Use static classifiers for dynamic point cloud tasks (3D) and use action classifiers for temporal anomaly detection (2D) - Link to a free online lecture by the author in comments
    submitted by /u/pinter69 [link] [comments]  ( 1 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 2 min )
    [P] image similarity metrics or algorithms
    I want to perform image similarity between images from frames of 2 different movie trailers. I am currently using SSIM and VGG 16 individually. But ssim does not capture color differences and VGG 16 isn't capturing structural integrity. I can use both together, but I wanted to know if there is any metric or algorithm which can capture both together with less discrepancies and can capture both together. Will appreciate any help. Thank you! submitted by /u/terminatorash2199 [link] [comments]  ( 1 min )
    [R] Interested in a Postdoctoral position bridging machine learning and neuroimaging?
    Recurrent neural networks for brain time series - Sano Centre for Computational Personalised Medicine submitted by /u/alecrimi [link] [comments]
    [R] ML with Intermediate Mathematics: VAEs with Normalized Flow (live series)
    Hi everyone, I'd like to share with you an exciting upcoming live series by Prof. Richard Xu of Hong Kong Baptist University. If you're interested, please click here to register! Description: "I have been planning to start a machine learning live series on topics that involve some intermediate mathematics, so I can help you to clarify some concepts. In order to fully grasp these concepts, you need to have sound knowledge of linear algebra, calculus, statistics and probability. However, if you just want to come and hear it for fun, please do so as well! The first topic is variational autoencoders with normalized flow, which I'll fully explain its beautiful mathematics over a period of a few sessions. You can find my notes on my GitHub site: https://github.com/roboticcam/machine-learning-notes/blob/master/files/vb_nf.pdf I will post the Zoom link to the registered participants. Please join us!" submitted by /u/ML_Live_Series [link] [comments]  ( 1 min )
    [research] Issues visualising a Resnet
    Hi all, We have a model used in cardiac MRi imaging, it is used to select the best image in a series of images. It consists of images -> Resnet -> LSTM -> output. The heatmap we generate from the Resnet alone shows output like the image attached, instead of actual anatomy, there is only little squares. We think this is likely due to the residual in the Resnet because it is not present in a VGG, but does anyone else have a better explanation, and an idea of how to visualise a Resnet? Resnet Saliency Map submitted by /u/Radiology_AI [link] [comments]  ( 1 min )
    [R] Critical analysis of deconfounded pretraining to improve visio-linguistic models
    Hi reddit, happy to share our new paper "Critical analysis of deconfounded pretraining to improve visio-linguistic models". In a nutshell, it's on the problem of out-of-distribution performance for visio-linguistic models, and it takes a closer look / surfaces some issues with an existing technique for improving OOD performance by doing automatic deconfounding (inspired by the causality framework of Structural Causal Models). ​ Paper: https://www.frontiersin.org/articles/10.3389/frai.2022.736791/full Code: https://github.com/Natithan/p1_causality Abstract: An important problem with many current visio-linguistic models is that they often depend on spurious correlations. A typical example of a spurious correlation between two variables is one that is due to a third variable causing …  ( 2 min )
    Best Papers That Solve Novel Problems? [D]
    We often talk about how the publish-or-perish paradigm leads to constant minor improvements on the same problems (Image classification, text generation, etc). What are some of the best papers that do the opposite? Rather than solving known problems in a marginally better way, they solve a new problem with known (or modified) methods. submitted by /u/SuspiciousWalrus99 [link] [comments]  ( 2 min )
    [Discussion] Advice on training document layout analysis models
    So, a bit of background, I am doing an RnD project in the area of improving the layout analysis of scientific documents. The proposed method is to use an active learning loop on standard object detection models and target those classes/layouts which are performing poorly and train the model based on them. Now, we have some selection strategies based on submodular selection functions to target the pages we want. I also have set up code to extract embeddings which will help me do the selection. But I don't have prior experience in active learning, especially setting it up with detectron2, because it seems to register a dataset to train, and it is really difficult to change dynamically in the middle of training, which is my use case. So I need some advice on the following:- The document analysis datasets are huge, DocBank is 50GB of images alone. How can I effectively store the embeddings in memory when I will call my selection algorithms mentioned above? How to set up an active learning loop in detectron2 for object detection. Or are there any alternatives? Some resources/code would be better There is some literature evidence suggesting that simple CNN backbone embeddings represent an image better than FasterRCNN or MaskRCNN embeddings. Specifically, this paper seems to be working on spliced image retrieval and it claims the following. Any thoughts/prior experience on this? Finally, is there any evidence supporting improvement in accuracy/precision in object detection using active learning? Or are there some better training paradigms? Thank you for your patience. submitted by /u/ExoticAd6868 [link] [comments]  ( 1 min )
    [D] Market Basket Analysis real-world examples and insights?
    I want to know more about Market Basket Analysis's real-world use case and unique insights/business value derived from performing the Association rule mining. I heard about the Beer-Diaper case study but many other sources invalidated it as a spurious correlation. Can someone share any example of business insights from Market Basket analysis and any interesting patterns they were able to observe?? submitted by /u/invincible_moron [link] [comments]  ( 2 min )
    [D] How are multiple training examples used in DMD, SINDy, etc.?
    The examples I have seen so far for DMD and SINDy use only 1 trajectory of the dynamical system for training. The input data is a 2D matrix, with the states/features being one dimension and time being the other dimension. But I want to use multiple trajectories of the same dynamical system for training, so the training data would be 3D (i.e., multiple 2D matrices). Are there examples where this has been done? Linear regression techniques (like pseudoinverse or LASSO) seem to be used to get the system matrix (in DMD) or the weights for the features (in SINDy). Can these methods be extended to 3D input data? submitted by /u/baigyaanik [link] [comments]  ( 1 min )
  • Open

    6 Business Applications that Badly Need Better AI
    The success and growth of AI is undeniable. Yet there are still basic tasks performing poorly, despite or because of automation. In some cases, you can blame reliance on outdated AI. In other cases, it is a result of corporate policies or multiple AI systems that compete against each other. The AI systems in question… Read More »6 Business Applications that Badly Need Better AI The post 6 Business Applications that Badly Need Better AI appeared first on Data Science Central.  ( 7 min )
    How exactly do you define Artificial Intelligence(AI)?
    How exactly do you define Artificial Intelligence(AI)? This looks like a back to basics/ back to school question – but the answer is not that simple Recently I was trying to find a good academic definition of AI for a research paper. Surprisingly, its not easy. In this post, I present a good definition for… Read More »How exactly do you define Artificial Intelligence(AI)? The post How exactly do you define Artificial Intelligence(AI)? appeared first on Data Science Central.  ( 2 min )
    Public wary of Meta’s Metaverse vision
    There are signs that Meta’s plans for the metaverse is faltering, including plummeting stock prices and the company’s announcement that it may withdraw from the EU market. The troubles stem from a myriad of issues, the most significant of which are data collection privacy issues and a lack of investor and public confidence in the… Read More »Public wary of Meta’s Metaverse vision The post Public wary of Meta’s Metaverse vision appeared first on Data Science Central.  ( 4 min )
    NLQ: Why You Might Not Need To Call A Data Analyst Anymore
    NLQ or Natural Language Query or Text-to-SQL or NL2SQL is an arm of computational linguistics, that helps users to fetch required data, visualizations, and insights, from sentences written in human language. As a business user, knowing data schema, table and column names, knowing metadata, having technical know-how of a BI tool or data querying skills… Read More »NLQ: Why You Might Not Need To Call A Data Analyst Anymore The post NLQ: Why You Might Not Need To Call A Data Analyst Anymore appeared first on Data Science Central.  ( 4 min )
    Advance in your finance and accounting careers with top technical skills
    Profession in finance and accounting is one of the top career choices for finance and accounting professionals. Employment of accountants and auditors is projected to grow 7 percent from the year 2020 to the year 2030, about as fast as the average for all occupations. About 135,000 openings for accountants and auditors are projected each year,… Read More »Advance in your finance and accounting careers with top technical skills The post Advance in your finance and accounting careers with top technical skills appeared first on Data Science Central.  ( 3 min )
  • Open

    Anybody ever programmed a 1st order differential equation model in MuJoCo?
    submitted by /u/SmarterCloud [link] [comments]  ( 1 min )
    Google AI Researchers Propose a Meta-Algorithm, Jump Start Reinforcement Learning, That Uses Prior Policies to Create a Learning Curriculum That Improves Performance
    In the field of artificial intelligence, reinforcement learning is a type of machine-learning strategy that rewards desirable behaviors while penalizing those which aren’t. An agent can perceive its surroundings and act accordingly through trial and error in general with this form or presence – it’s kind of like getting feedback on what works for you. However, learning rules from scratch in contexts with complex exploration problems is a big challenge in RL. Because the agent does not receive any intermediate incentives, it cannot determine how close it is to complete the goal. As a result, exploring the space at random becomes necessary until the door opens. Given the length of the task and the level of precision required, this is highly unlikely. Exploring the state space randomly with preliminary information should be avoided while performing this activity. This prior knowledge aids the agent in determining which states of the environment are desirable and should be investigated further. Offline data collected by human demonstrations, programmed policies, or other RL agents could be used to train a policy and then initiate a new RL policy. This would include copying the pre-trained policy’s neural network to the new RL policy in the scenario where we utilize neural networks to describe the procedures. This process transforms the new RL policy into a pre-trained one. However, as seen below, naively initializing a new RL policy like this frequently fails, especially for value-based RL approaches. Continue reading the summary Paper: https://arxiv.org/pdf/2204.02372.pdf Project: https://jumpstart-rl.github.io/ https://reddit.com/link/u0n5hv/video/fnktgf0wqqs81/player submitted by /u/No_Coffee_4638 [link] [comments]  ( 2 min )
    "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language", Zeng et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Classical Dynamic Programming ve Policy Iteration
    Cracking my head trying to figure out the differences between classical dynamic programming and policy iteration. I understand that policy iteration in itself is a form of dynamic programming. But if we were to compare the traditional operations of dynamic programming with policy iteration, what would be the differences. Thank you Heaps submitted by /u/BalramVeeragoo [link] [comments]  ( 1 min )
    Is my understanding to why future rewards being considered correct?
    To my understanding, the Q value is updated like this: Q[s,a] = Q[s,a] + lr * (reward + gamma * max(Q[s,a]t+1) — Q[s,a]) Where future state reward is considered since the best current reward doesn't grantee the optimal path. E.g.: Path A: Q[s,a]t = 1, Q[s,a]t+1 = 10 Total: 11 Path B: Q[s,a]t = 5, Q[s,a]t+1 = 1 Total: 6 Not sure if this is a good analogy but here it gives the result that A is the optimal path even though its immediate reward at (t) is less than B. Further Question: Is there additional benefits to considering future rewards further than Q[s,a]t+1 ? Example: Q[s,a] = Q[s,a] + lr * (reward + gamma* max(Q[s,a]t+2) - gamma * max(Q[s,a]t+1) — Q[s,a]) submitted by /u/DangerNoodle314 [link] [comments]  ( 1 min )
    Any reason why to use several optimizers in Pytorch implementation of REDQ?
    Hi guys. I am currently implementing REDQ by modifying a working implementation of SAC (basically adapted from Spinup) and so far my implementation doesn't work, I am trying to understand why. By looking at the authors' implementation I notice they use 1 pytorch optimizer per Q network, whereas I only use 1 for all parameters. So I wonder, is there any good reason for using several optimizers here? Thanks! submitted by /u/yannbouteiller [link] [comments]  ( 1 min )
    Learning in Noisy Observation space
    I am fairly new to RL. I'm trying to train an agent (like in gym's Cartpole env) in an environment with noisy (Gaussian noise) observation. I have added Gaussian noise to angle only (not to cart position, cart velocity or ang_velocity). I was fooling around with PPO in stable_baselines but haven't had much luck. Any suggestions on what needs to be tweaked or any good algorithm for this task? Also, I tried changing the default forcing magnitude of 10 to other values like 30 but it didn't help much. Thanks submitted by /u/Black_Beard53 [link] [comments]  ( 2 min )
  • Open

    Summer school in between neuroimaging and machine learning
    submitted by /u/pasticciociccio [link] [comments]
    Dall-e and Dall-e2
    I have been into the website of OpenAI, it is unclear what has been added to Dall-e2, why should we subscribe for a github which will be made public? And what is the color code at the bottom? submitted by /u/pasticciociccio [link] [comments]  ( 1 min )
    Researchers, Including Yann Lecun, Propose ‘projUNN’: An Efficient Method For Training Deep Neural Networks With Unitary Matrices
    When deep networks or inputs involve extensive data sequences, learning in neural networks can be unstable. Recurrent states in vanilla recurrent neural networks (RNNs) are generated by repeatedly applying a linear transformation followed by a pointwise nonlinearity. This becomes unstable when the linear transformation’s eigenvalues are not of magnitude one. Unitary matrices have been utilized to solve the problem of disappearing and exploding gradients because they have eigenvalues of size one, naturally, and have been. Unitary convolutional layers have recently been developed in a similar way to aid in the development of more stable deep networks with norm-preserving transformations. The loss function’s derivative with respect to the weights is called a gradient. During backpropagation in neural networks, it is utilized to update the weights to minimize the loss function. When traveled backward with each layer, the derivative or slope continuously grows lower, resulting in a vanishing gradient. When the weight update is exponentially small, the training time is excessively long. In the worst-case scenario, the neural network training may be stopped entirely. Exploding gradients, on the other hand, occur when the slope increases with each successive layer during backpropagation. The gradient will never converge due to the high weights, causing it to oscillate around the minima without ever reaching a global minima point. Continue Reading Paper: https://arxiv.org/pdf/2203.05483.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )

  • Open

    Ran 3D art of my AI character thru ArcaneGAN; AI making art of AI.
    submitted by /u/alex-redacted [link] [comments]
    New Technology, Old Problems: The Missing Voices in Natural Language Processing
    submitted by /u/regalalgorithm [link] [comments]
    Check Out This DeepMind’s New Language Model, Chinchilla (70B Parameters), Which Significantly Outperforms Gopher (280B) and GPT-3 (175B) on a Large Range of Downstream Evaluation Tasks
    https://preview.redd.it/pkrbloq8vjs81.png?width=1422&format=png&auto=webp&s=fef693165a6c948f626de613e4e341c25f8cf5f4 ​ Extreme-scale language models have recently exhibited incredible performance on natural language processing challenges. This is due to their ever-increasing size, exceeding 500 billion parameters. However, while these models have grown in popularity in recent years, the amount of data utilized to train them has not increased. The current generation of huge language models is clearly undertrained. Three prediction approaches for optimally choosing both model size and training length have been proposed by a DeepMind research team. Three approaches have been mentioned to estimate the optimal parameter: Change the size of the models and the number of training tokens. IsoFLOP profiles Using a parametric loss function to fit a model The ultimate pretraining loss is calculated as the number of model parameters and training tokens. They minimize the loss function under the restriction of the FLOPs function, which is equal to the computational budget because the computational budget is a probabilistic function of the number of observed training tokens and model parameters. Continue Reading This Research Summary Paper: https://arxiv.org/pdf/2203.15556.pdf submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Flaming Rose art made with snowpixelapp using AI.
    submitted by /u/AIWORQART [link] [comments]
    How do you start a professional career in the Affective Computing field?
    I'm about to graduate with a master's degree in Computer Science and I'm very passionate about Affective Computing. I would like to start looking for a job in this field, but most companies (not consulting) are looking for people with experience or a PhD. What do you recommend me to do? Continue with the PhD or try to find something, maybe in some startup? submitted by /u/_rikya_ [link] [comments]  ( 1 min )
    Deep learning to enable color vision in the dark
    submitted by /u/qptbook [link] [comments]
    Laptop for beginner?
    I'm joining MSc AI & ML this September. I want to buy a laptop. Is MacBook Air sufficient for this? If not what would you recommend to someone like me? submitted by /u/RauhanSheikh [link] [comments]  ( 1 min )
    How can I help the advancement of AI? I want to contribute and make this my career. What should I do?
    Please give a thorough and in-depth response. submitted by /u/trillswan [link] [comments]  ( 1 min )
  • Open

    Gizmo is eating a clothes basket
    submitted by /u/mspurplekris [link] [comments]
    "Habitat-Web: Learning Embodied Object-Search Strategies from Human Demonstrations at Scale", Ramrakhya et al 2022 {FB} (log-scaling of crowdsourced imitation learning in VR robotics)
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Is it possible to implement ACER with A2C?
    I'm looking into implementing a replay buffer in A2C. I came upon the ACER [paper](https://arxiv.org/pdf/1611.01224.pdf). From my understanding, it looks like ACER is an extension of A3C, and it seems like the difference between A2C and A3C is that in A2C parameters are updated synchronously and that helps with big batch sizes. Is it still possible to implement some kind of replay buffer on A2C? Are there any papers that involve implementing a paper with A2C that you recommend I read? I'm new to the area of reinforcement learning, so I would be very grateful for any kind of help you can offer. Thanks in advance submitted by /u/lebr0n99 [link] [comments]  ( 1 min )
    Does anyone have a link to 'The RL Discord Server'
    Supposedly there is a popular discord server for the RL community, however I am having difficulty finding it. submitted by /u/jclaessens [link] [comments]
    "Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning", Qi et al 2022
    submitted by /u/gwern [link] [comments]  ( 1 min )
    Reinforcement Learning - looking for some resources
    Hello friends, I'm looking for some resources that would let me quickly start with Reinforcement Learning (preferably in Python). I have some experience with supervised learning (e.g. deep nets) and would like to complement with some RL. Preferably a walkthrough with some examples of implementation. Can you recommend something? Thanks in advance! submitted by /u/andy-codes [link] [comments]  ( 1 min )
    I'm dumb at maths: what does this mean?
    So learning about e-mail learning same have it all understood except for the max thingy. If you care enough to click this blog it's not my blog): https://towardsdatascience.com/simple-reinforcement-learning-q-learning-fcddc4b6fe56 I don't know how to turn this into to a real example: Update q values Q[state, action] = Q[state, action] + lr * (reward + gamma * np.max(Q[new_state, :]) — Q[state, action]) Specifically the last bit: np.max(Q[new_state, :]) — Q[state, action]) What does the numpy max actually operate on here? Any hard examples? Thanks. submitted by /u/Togfox [link] [comments]  ( 2 min )
  • Open

    [R] Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning
    #cvpr-2022 Happy to share our CVPR-2022 paper "Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning" Paper: https://arxiv.org/pdf/2111.14213.pdf Code: https://github.com/mmendiet/FedAlign Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. submitted by /u/Extension-Sun1816 [link] [comments]  ( 1 min )
    [D] ICML2022 Domain conflicts system
    I was wondering if the Domain conflicts system working well? I got an email from the PCs and seems that the domain conflict system is not working well. It said that we can enter the conflict now but I cannot edit the conflict in the system. Could anyone tell me how to do it? Thanks! Dear ICML Authors, As we are seeing this happen, we just wanted to send you a brief explanation -- this only applies to some few papers. A few papers are losing reviews because of newly arising conflicts. If you did not enter your conflicts in CMT during the submission phase (as requested via CMT), then they could not be used in paper assignments. If you enter them now, any reviews by reviewers with conflicting domains will disappear, and you may see fewer reviews as a result. Unfortunately, we have no control over this, as the conflicts should have been entered when the paper was submitted. Best, Stefanie, Le and Csaba submitted by /u/Snoo_97274 [link] [comments]  ( 1 min )
    [D] Poll: How do you deploy models & endpoints?
    View Poll submitted by /u/martolini [link] [comments]  ( 1 min )
    [R][P] Generate images from text with Latent Diffusion LAION-400M Model + Gradio Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [P] tinydl - library to help with hyperparameter search and metric reporting in pytorch
    Hi everyone, I built a small library to help with hyperparameter search for deep learning models created with pytorch, because I got kinda tired of having to rewrite large pieces code over and over again. You can check it out here: https://github.com/michi-jeremias/tinydl or you can even install it with pip (pip install tinydl). I have included a readme and an example of how the library can be used. About the library, it's pretty flexible about reporting different metrics to the console and to tensorboard (add_scalar, add_hparam) at each stage of the process, like after a batch, epoch of after a whole run over multiple epochs. It can also be easily extended to include other metrics or new types of outputs. Since this is basically my first attempt at a software project that's not intended only to be used by myself I'd be happy about any feedback you have for me! If the project doesn't qualify to be posted here due to being too simple/too much on a beginner level, apologies for that. submitted by /u/abacaxiquaxi [link] [comments]  ( 1 min )
    [D] StyleGAN2 Path Length Regularization Implementation Clarification
    I am trying to implement stylegan2 and there are so many things here that are not explained either well, or at all in the paper. ​ How exactly is path length regularization implemented? In this PT code we can see that the $|J^T_w.y|$ is computed as follows: ​ def g_path_regularize(fake_img, latents, mean_path_length, decay=0.01): noise = torch.randn_like(fake_img) / math.sqrt( fake_img.shape[2] * fake_img.shape[3] ) grad, = autograd.grad( outputs=(fake_img * noise).sum(), inputs=latents, create_graph=True ) path_lengths = torch.sqrt(grad.pow(2).sum(2).mean(1)) path_mean = mean_path_length + decay * (path_lengths.mean() - mean_path_length) path_penalty = (path_lengths - path_mean).pow(2).mean() return path_penalty, path_mean.detach(), path_lengths This is based on this official TF implementation. The problem I have is that from what I understand, fake_img is 4D, and latents is 2D. The grad output in this case will be 2D and grad.pow(2).sum(2) cannot be computed because the third axis does not exist. Obviously people who are using these repos have not reported any issue regarding mismatch of shapes and axes, so I believe there is something else going on. Since I'm trying to implement this in my own network, I cannot get the desired shape any how. I get a 2D gradient output. submitted by /u/feryet [link] [comments]  ( 1 min )
    [P] Jax/Haiku pretrained models: MobileNet, ResNet, VGG, Xception.
    I released a repository of models with optional pretrained weights(Weights are taken from TF/Keras) to be used for tasks like prediction, feature extraction and fine-tuning. Github: https://github.com/abarcel/haikumodels Currently Available Models MobileNet ResNet [50, 101, 152] VGG [16, 19] Xception Also planning to release more, as soon as I find time for it. submitted by /u/abarcel [link] [comments]
    [D] Denoising in the latent space
    I spent some time reading about and playing around with speech denoising DNNs ~2019. At the time the popular architecture was U-Net (encoder -> bottleneck -> decoder with skip connections) operating on spectrograms. These U-Nets were trained directly on noisy/clean speech pairs and the loss was the difference between the predicted denoised images and actual denoised image. MSE between the predicted/actual images was a baseline loss but people alse added "feature loss" or sometimes a GAN-based loss function as well. Anyway a cursory reading of the DALL-E 2 paper has me thinking about that approach. I'm curious to know if a similar approach used for DALL-E has been tried for audio denoising: pre-train an encoder/decoder in a self-supervised fashion on a large dataset of audio train a denoiser to operate only in the latent space (ie the most compressed representation that is passed from the encoder to the decoder) step 1 - self-supervised training of encoder/decoder https://preview.redd.it/nprc40ob9gs81.png?width=1668&format=png&auto=webp&s=3a9b181ceff6c4530b5f41abf793dfb6409c0ec2 step 2 - train denoiser in latent space only ​ https://preview.redd.it/hreajdpe9gs81.png?width=1279&format=png&auto=webp&s=e16490fff00fe82487ca214b11b642ffcb30fb1c step 3 - do inference by feeding denoised latent space vector into the decoder https://preview.redd.it/7nox4ezh9gs81.png?width=2034&format=png&auto=webp&s=ccc69bbb2096d84a9e4a824000e62bb0f80fbe29 Is this a common approach already? It seems like once you have a good pretrained encoder/decoder pair then the denoiser training would be much more efficient than training an entire network that does everything at once from scratch (smaller search space, faster training loop) submitted by /u/The_Amp_Walrus [link] [comments]  ( 1 min )
    [Discussion] MLOps vs Platform Engineering
    Hey guys, I have the opportunity to either move to the platform engineering team or the freshly created MLOps team within my company. I'm interested in both careers, as I like to build Infra. I'm currently a Data Eng, and I find myself to like building apps and enabling applications to talk to each other, better than cleaning up data. I worked as a data scientist before, but I didn't like the science. I was always into engineering. What would make sense from a career perspective (both long and short term), i.e., money, promotions, attractiveness, etc. submitted by /u/dash2392 [link] [comments]  ( 1 min )
  • Open

    Distributed Reinforcement Learning for Robot Teams: A Review. (arXiv:2204.03516v1 [cs.RO])
    Purpose of review: Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation. Recent findings: Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. Building upon the "centralized training, decentralized execution" paradigm, recent MARL approaches include independent learning, centralized critic, value decomposition, and communication learning approaches. Cooperative behaviors are demonstrated through AI benchmarks and fundamental real-world robotic capabilities such as multi-robot motion/path planning. Summary: This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. We present benchmarks and robotic applications along with a discussion on current open avenues for research.  ( 2 min )
    Spatial Graph Attention and Curiosity-driven Policy for Antiviral Drug Discovery. (arXiv:2106.02190v5 [cs.LG] UPDATED)
    We developed Distilled Graph Attention Policy Network (DGAPN), a reinforcement learning model to generate novel graph-structured chemical representations that optimize user-defined objectives by efficiently navigating a physically constrained domain. The framework is examined on the task of generating molecules that are designed to bind, noncovalently, to functional sites of SARS-CoV-2 proteins. We present a spatial Graph Attention (sGAT) mechanism that leverages self-attention over both node and edge attributes as well as encoding the spatial structure -- this capability is of considerable interest in synthetic biology and drug discovery. An attentional policy network is introduced to learn the decision rules for a dynamic, fragment-based chemical environment, and state-of-the-art policy gradient techniques are employed to train the network with stability. Exploration is driven by the stochasticity of the action space design and the innovation reward bonuses learned and proposed by random network distillation. In experiments, our framework achieved outstanding results compared to state-of-the-art algorithms, while reducing the complexity of paths to chemical synthesis.  ( 2 min )
    Flexible Amortized Variational Inference in qBOLD MRI. (arXiv:2203.05845v2 [eess.IV] UPDATED)
    Streamlined qBOLD acquisitions enable experimentally straightforward observations of brain oxygen metabolism. $R_2^\prime$ maps are easily inferred; however, the Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. As such, existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV. Initially, we create a model that produces informative voxelwise prior distribution based on synthetic training data. Contrary to prior work, we model the joint distribution of OEF and DBV through a scaled multivariate logit-Normal distribution, which enables the values to be constrained within a plausible range. The prior distribution model is used to train an efficient amortized variational Bayesian inference model. This model learns to infer OEF and DBV by predicting real image data, with few training data required, using the signal equations as a forward model. We demonstrate that our approach enables the inference of smooth OEF and DBV maps, with a physiologically plausible distribution that can be adapted through specification of an informative prior distribution. Other benefits include model comparison (via the evidence lower bound) and uncertainty quantification for identifying image artefacts. Results are demonstrated on a small study comparing subjects undergoing hyperventilation and at rest. We illustrate that the proposed approach allows measurement of gray matter differences in OEF and DBV and enables voxelwise comparison between conditions, where we observe significant increases in OEF and $R_2^\prime$ during hyperventilation.  ( 2 min )
    First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems. (arXiv:2204.03132v1 [math.OC])
    We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs) in which the strategy sets for each player are defined by equality and inequality constraints that may depend on the choices of rival players. While the asymptotic global convergence and local convergence rate of solution procedures have been studied in this setting, the analysis of iteration complexity is still in its infancy. Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method, respectively, with an accelerated mirror-prox algorithm as the inner loop. We provide nonasymptotic theoretical guarantees for these algorithms. More specifically, we establish the global convergence rate of our algorithms for solving (strongly) monotone NGNEPs and we provide iteration complexity bounds expressed in terms of the number of gradient evaluations. Experimental results demonstrate the efficiency of our algorithms.  ( 2 min )
    DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores. (arXiv:2204.03219v1 [eess.AS])
    Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.  ( 2 min )
    Unsupervised Image-to-Image Translation with Generative Prior. (arXiv:2204.03641v1 [cs.CV])
    Unsupervised image-to-image translation aims to learn the translation between two visual domains without paired data. Despite the recent progress in image translation models, it remains challenging to build mappings between complex domains with drastic visual discrepancies. In this work, we present a novel framework, Generative Prior-guided UNsupervised Image-to-image Translation (GP-UNIT), to improve the overall quality and applicability of the translation algorithm. Our key insight is to leverage the generative prior from pre-trained class-conditional GANs (e.g., BigGAN) to learn rich content correspondences across various domains. We propose a novel coarse-to-fine scheme: we first distill the generative prior to capture a robust coarse-level content representation that can link objects at an abstract semantic level, based on which fine-level content features are adaptively learned for more accurate multi-level content correspondences. Extensive experiments demonstrate the superiority of our versatile framework over state-of-the-art methods in robust, high-quality and diversified translations, even for challenging and distant domains.  ( 2 min )
    SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis. (arXiv:2204.03040v1 [cs.SD])
    In this work, we present the SOMOS dataset, the first large-scale mean opinion scores (MOS) dataset consisting of solely neural text-to-speech (TTS) samples. It can be employed to train automatic MOS prediction systems focused on the assessment of modern synthesizers, and can stimulate advancements in acoustic model evaluation. It consists of 20K synthetic utterances of the LJ Speech voice, a public domain speech dataset which is a common benchmark for building neural acoustic models and vocoders. Utterances are generated from 200 TTS systems including vanilla neural acoustic models as well as models which allow prosodic variations. An LPCNet vocoder is used for all systems, so that the samples' variation depends only on the acoustic models. The synthesized utterances provide balanced and adequate domain and length coverage. We collect MOS naturalness evaluations on 3 English Amazon Mechanical Turk locales and share practices leading to reliable crowdsourced annotations for this task. Baseline results of state-of-the-art MOS prediction models on the SOMOS dataset are presented, while we show the challenges that such models face when assigned to evaluate synthetic utterances.  ( 2 min )
    Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection---Extended Version. (arXiv:2204.03341v1 [cs.LG])
    Time series data occurs widely, and outlier detection is a fundamental problem in data mining, which has numerous applications. Existing autoencoder-based approaches deliver state-of-the-art performance on challenging real-world data but are vulnerable to outliers and exhibit low explainability. To address these two limitations, we propose robust and explainable unsupervised autoencoder frameworks that decompose an input time series into a clean time series and an outlier time series using autoencoders. Improved explainability is achieved because clean time series are better explained with easy-to-understand patterns such as trends and periodicities. We provide insight into this by means of a post-hoc explainability analysis and empirical studies. In addition, since outliers are separated from clean time series iteratively, our approach offers improved robustness to outliers, which in turn improves accuracy. We evaluate our approach on five real-world datasets and report improvements over the state-of-the-art approaches in terms of robustness and explainability. This is an extended version of "Robust and Explainable Autoencoders for Unsupervised Time Series Outlier Detection", to appear in IEEE ICDE 2022.  ( 2 min )
    Mo\"ET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning. (arXiv:1906.06717v4 [cs.LG] UPDATED)
    Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adoption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to expose the inner workings of the model in a human understandable form. We present Mo\"ET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. Thanks to such gating function the model is more expressive than the standard decision tree. To support non-differentiable decision trees as experts, we formulate a novel training procedure. In addition, we introduce a hard thresholding version, Mo\"ETH, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, Mo\"ETH allows each prediction to be easily decomposed into a set of logical rules in a form which can be easily verified. While Mo\"ET is a general use model, we illustrate its power in the reinforcement learning setting. By training Mo\"ET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models. Moreover, we show that Mo\"ET can also be used in real-world supervised problems on which it outperforms other verifiable machine learning models.  ( 3 min )
    Improving Cooperative Game Theory-based Data Valuation via Data Utility Learning. (arXiv:2107.06336v2 [cs.LG] UPDATED)
    The Shapley value (SV) and Least core (LC) are classic methods in cooperative game theory for cost/profit sharing problems. Both methods have recently been proposed as a principled solution for data valuation tasks, i.e., quantifying the contribution of individual datum in machine learning. However, both SV and LC suffer computational challenges due to the need for retraining models on combinatorially many data subsets. In this work, we propose to boost the efficiency in computing Shapley value or Least core by learning to estimate the performance of a learning algorithm on unseen data combinations. Theoretically, we derive bounds relating the error in the predicted learning performance to the approximation error in SV and LC. Empirically, we show that the proposed method can significantly improve the accuracy of SV and LC estimation.  ( 2 min )
    Federated Learning from Only Unlabeled Data with Class-Conditional-Sharing Clients. (arXiv:2204.03304v1 [cs.LG])
    Supervised federated learning (FL) enables multiple clients to share the trained model without sharing their labeled data. However, potential clients might even be reluctant to label their own data, which could limit the applicability of FL in practice. In this paper, we show the possibility of unsupervised FL whose model is still a classifier for predicting class labels, if the class-prior probabilities are shifted while the class-conditional distributions are shared among the unlabeled data owned by the clients. We propose federation of unsupervised learning (FedUL), where the unlabeled data are transformed into surrogate labeled data for each of the clients, a modified model is trained by supervised FL, and the wanted model is recovered from the modified model. FedUL is a very general solution to unsupervised FL: it is compatible with many supervised FL methods, and the recovery of the wanted model can be theoretically guaranteed as if the data have been labeled. Experiments on benchmark and real-world datasets demonstrate the effectiveness of FedUL. Code is available at https://github.com/lunanbit/FedUL.  ( 2 min )
    GraFN: Semi-Supervised Node Classification on Graph with Few Labels via Non-Parametric Distribution Assignment. (arXiv:2204.01303v2 [cs.LG] UPDATED)
    Despite the success of Graph Neural Networks (GNNs) on various applications, GNNs encounter significant performance degradation when the amount of supervision signals, i.e., number of labeled nodes, is limited, which is expected as GNNs are trained solely based on the supervision obtained from the labeled nodes. On the other hand,recent self-supervised learning paradigm aims to train GNNs by solving pretext tasks that do not require any labeled nodes, and it has shown to even outperform GNNs trained with few labeled nodes. However, a major drawback of self-supervised methods is that they fall short of learning class discriminative node representations since no labeled information is utilized during training. To this end, we propose a novel semi-supervised method for graphs, GraFN, that leverages few labeled nodes to ensure nodes that belong to the same class to be grouped together, thereby achieving the best of both worlds of semi-supervised and self-supervised methods. Specifically, GraFN randomly samples support nodes from labeled nodes and anchor nodes from the entire graph. Then, it minimizes the difference between two predicted class distributions that are non-parametrically assigned by anchor-supports similarity from two differently augmented graphs. We experimentally show that GraFN surpasses both the semi-supervised and self-supervised methods in terms of node classification on real-world graphs. The source code for GraFN is available at https://github.com/Junseok0207/GraFN.  ( 2 min )
    DynLight: Realize dynamic phase duration with multi-level traffic signal control. (arXiv:2204.03471v1 [cs.AI])
    Adopting reinforcement learning (RL) for traffic signal control is increasingly popular. Most RL methods use fixed action interval (denoted as tduration) and actuate or maintain a phase every tduration, which makes the phase duration less dynamic and flexible. In addition, the actuated phase can be arbitrary, affecting the real-world deployment, which requires a fixed cyclical phase structure. To address these challenges, we propose a multi-level traffic signal control framework, DynLight, which uses an optimization method Max-QueueLength (M-QL) to determine the phase and uses a deep Q-network to determine the corresponding duration. Based on DynLight, we further propose DynLight-C that adopts a well trained deep Q-network of DynLight and replace M-QL by a fixed cyclical control policy that actuate a set of phases in fixed order to realize cyclical phase structure. Comprehensive experiments on multiple real-world datasets demonstrate that DynLight achives a new state-of-the-art. Furthermore, the deep Q-network of DynLight can learn well on determining the phase duration and DynLight-C demonstrates high performance for deployment.  ( 2 min )
    Multiplayer Performative Prediction: Learning in Decision-Dependent Games. (arXiv:2201.03398v2 [cs.GT] UPDATED)
    Learning problems commonly exhibit an interesting feedback mechanism wherein the population data reacts to competing decision makers' actions. This paper formulates a new game theoretic framework for this phenomenon, called "multi-player performative prediction". We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game. The latter equilibria are arguably more informative, but can be found efficiently only when the game is monotone. We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms, including repeated retraining and the repeated (stochastic) gradient method. We then establish transparent sufficient conditions for strong monotonicity of the game and use them to develop algorithms for finding Nash equilibria. We investigate derivative free methods and adaptive gradient algorithms wherein each player alternates between learning a parametric description of their distribution and gradient steps on the empirical risk. Synthetic and semi-synthetic numerical experiments illustrate the results.
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v1 [stat.CO])
    Although evaluation of the expectations on the Ising model is essential in various applications, this is frequently infeasible because of intractable multiple summations (or integrations). Spatial Monte Carlo integration (SMCI) is a sampling-based approximation, and can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator is monotonically improved as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such region expansion. In this study, based on the theory of generalized least squares, a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 2 min )
    Automated question generation and question answering from Turkish texts. (arXiv:2111.06476v4 [cs.LG] UPDATED)
    While exam-style questions are a fundamental educational tool serving a variety of purposes, manual construction of questions is a complex process that requires training, experience and resources. Automatic question generation (QG) techniques can be utilized to satisfy the need for a continuous supply of new questions by streamlining their generation. However, compared to automatic question answering (QA), QG is a more challenging task. In this work, we fine-tune a multilingual T5 (mT5) transformer in a multi-task setting for QA, QG and answer extraction tasks using Turkish QA datasets. To the best of our knowledge, this is the first academic work that performs automated text-to-text question generation from Turkish texts. Experimental evaluations show that the proposed multi-task setting achieves state-of-the-art Turkish question answering and question generation performance on TQuADv1, TQuADv2 datasets and XQuAD Turkish split. The source code and the pre-trained models are available at https://github.com/obss/turkish-question-generation.
    MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. (arXiv:2204.03193v1 [stat.ML])
    A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.  ( 2 min )
    On Monte Carlo Tree Search for Weighted Vertex Coloring. (arXiv:2202.01665v2 [cs.LG] UPDATED)
    This work presents the first study of using the popular Monte Carlo Tree Search (MCTS) method combined with dedicated heuristics for solving the Weighted Vertex Coloring Problem. Starting with the basic MCTS algorithm, we gradually introduce a number of algorithmic variants where MCTS is extended by various simulation strategies including greedy and local search heuristics. We conduct experiments on well-known benchmark instances to assess the value of each studied combination. We also provide empirical evidence to shed light on the advantages and limits of each strategy.
    A survey on recently proposed activation functions for Deep Learning. (arXiv:2204.02921v2 [cs.LG] UPDATED)
    Artificial neural networks (ANN), typically referred to as neural networks, are a class of Machine Learning algorithms and have achieved widespread success, having been inspired by the biological structure of the human brain. Neural networks are inherently powerful due to their ability to learn complex function approximations from data. This generalization ability has been able to impact multidisciplinary areas involving image recognition, speech recognition, natural language processing, and others. Activation functions are a crucial sub-component of neural networks. They define the output of a node in the network given a set of inputs. This survey discusses the main concepts of activation functions in neural networks, including; a brief introduction to deep neural networks, a summary of what are activation functions and how they are used in neural networks, their most common properties, the different types of activation functions, some of the challenges, limitations, and alternative solutions faced by activation functions, concluding with the final remarks.
    Covariance matrix preparation for quantum principal component analysis. (arXiv:2204.03495v1 [quant-ph])
    Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.
    On the Effectiveness of Pretrained Models for API Learning. (arXiv:2204.03498v1 [cs.SE])
    Developers frequently use APIs to implement certain functionalities, such as parsing Excel Files, reading and writing text files line by line, etc. Developers can greatly benefit from automatic API usage sequence generation based on natural language queries for building applications in a faster and cleaner manner. Existing approaches utilize information retrieval models to search for matching API sequences given a query or use RNN-based encoder-decoder to generate API sequences. As it stands, the first approach treats queries and API names as bags of words. It lacks deep comprehension of the semantics of the queries. The latter approach adapts a neural language model to encode a user query into a fixed-length context vector and generate API sequences from the context vector. We want to understand the effectiveness of recent Pre-trained Transformer based Models (PTMs) for the API learning task. These PTMs are trained on large natural language corpora in an unsupervised manner to retain contextual knowledge about the language and have found success in solving similar Natural Language Processing (NLP) problems. However, the applicability of PTMs has not yet been explored for the API sequence generation task. We use a dataset that contains 7 million annotations collected from GitHub to evaluate the PTMs empirically. This dataset was also used to assess previous approaches. Based on our results, PTMs generate more accurate API sequences and outperform other related methods by around 11%. We have also identified two different tokenization approaches that can contribute to a significant boost in PTMs' performance for the API sequence generation task.
    Graph Neural Network-based Android Malware Classification with Jumping Knowledge. (arXiv:2201.07537v5 [cs.CR] UPDATED)
    This paper presents a new Android malware detection method based on Graph Neural Networks (GNNs) with Jumping-Knowledge (JK). Android function call graphs (FCGs) consist of a set of program functions and their inter-procedural calls. Thus, this paper proposes a GNN-based method for Android malware detection by capturing meaningful intra-procedural call path patterns. In addition, a Jumping-Knowledge technique is applied to minimize the effect of the over-smoothing problem, which is common in GNNs. The proposed method has been extensively evaluated using two benchmark datasets. The results demonstrate the superiority of our approach compared to state-of-the-art approaches in terms of key classification metrics, which demonstrates the potential of GNNs in Android malware detection and classification.
    Federated Learning with Erroneous Communication Links. (arXiv:2201.12991v2 [cs.LG] UPDATED)
    In this paper, we consider the federated learning (FL) problem in the presence of communication errors. We model the link between the devices and the central node (CN) by a packet erasure channel, where the local parameters from devices are either erased or received correctly by CN with probability $e$ and $1-e$, respectively. We provide mathematical proof for the convergence of the FL algorithm in the presence of communication errors, where the CN uses past local updates when the fresh updates are not received from some devices. We show via simulations that by using the past local updates, the FL algorithm can converge in the presence of communication errors. We also show that when the dataset is uniformly distributed among devices, the FL algorithm that only uses fresh updates and discards missing updates might converge faster than the FL algorithm that uses past local updates.
    An Exploration of Active Learning for Affective Digital Phenotyping. (arXiv:2204.01915v2 [cs.LG] UPDATED)
    Some of the most severe bottlenecks preventing widespread development of machine learning models for human behavior include a dearth of labeled training data and difficulty of acquiring high quality labels. Active learning is a paradigm for using algorithms to computationally select a useful subset of data points to label using metrics for model uncertainty and data similarity. We explore active learning for naturalistic computer vision emotion data, a particularly heterogeneous and complex data space due to inherently subjective labels. Using frames collected from gameplay acquired from a therapeutic smartphone game for children with autism, we run a simulation of active learning using gameplay prompts as metadata to aid in the active learning process. We find that active learning using information generated during gameplay slightly outperforms random selection of the same number of labeled frames. We next investigate a method to conduct active learning with subjective data, such as in affective computing, and where multiple crowdsourced labels can be acquired for each image. Using the Child Affective Facial Expression (CAFE) dataset, we simulate an active learning process for crowdsourcing many labels and find that prioritizing frames using the entropy of the crowdsourced label distribution results in lower categorical cross-entropy loss compared to random frame selection. Collectively, these results demonstrate pilot evaluations of two novel active learning approaches for subjective affective data collected in noisy settings.
    ECMG: Exemplar-based Commit Message Generation. (arXiv:2203.02700v2 [cs.SE] UPDATED)
    Commit messages concisely describe the content of code diffs (i.e., code changes) and the intent behind them. Recently, many approaches have been proposed to generate commit messages automatically. The information retrieval-based methods reuse the commit messages of similar code diffs, while the neural-based methods learn the semantic connection between code diffs and commit messages. However, the reused commit messages might not accurately describe the content/intent of code diffs and neural-based methods tend to generate high-frequent and repetitive tokens in the corpus. In this paper, we combine the advantages of the two technical routes and propose a novel exemplar-based neural commit message generation model, which treats the similar commit message as an exemplar and leverages it to guide the neural network model to generate an accurate commit message. We perform extensive experiments and the results confirm the effectiveness of our model.
    Causality, Causal Discovery, and Causal Inference in Structural Engineering. (arXiv:2204.01543v2 [cs.LG] UPDATED)
    Much of our experiments are designed to uncover the cause(s) and effect(s) behind a data generating mechanism (i.e., phenomenon) we happen to be interested in. Uncovering such relationships allows us to identify the true working of a phenomenon and, most importantly, articulate a model that may enable us to further explore the phenomenon on hand and/or allow us to predict it accurately. Fundamentally, such models are likely to be derived via a causal approach (as opposed to an observational or empirical mean). In this approach, causal discovery is required to create a causal model, which can then be applied to infer the influence of interventions, and answer any hypothetical questions (i.e., in the form of What ifs? Etc.) that we might have. This paper builds a case for causal discovery and causal inference and contrasts that against traditional machine learning approaches; all from a civil and structural engineering perspective. More specifically, this paper outlines the key principles of causality and the most commonly used algorithms and packages for causal discovery and causal inference. Finally, this paper also presents a series of examples and case studies of how causal concepts can be adopted for our domain.
    VNIbCReg: VICReg with Neighboring-Invariance and better-Covariance Evaluated on Non-stationary Seismic Signal Time Series. (arXiv:2204.02697v2 [cs.LG] UPDATED)
    One of the latest self-supervised learning (SSL) methods, VICReg, showed a great performance both in the linear evaluation and the fine-tuning evaluation. However, VICReg is proposed in computer vision and it learns by pulling representations of random crops of an image while maintaining the representation space by the variance and covariance loss. However, VICReg would be ineffective on non-stationary time series where different parts/crops of input should be differently encoded to consider the non-stationarity. Another recent SSL proposal, Temporal Neighborhood Coding (TNC) is effective for encoding non-stationary time series. This study shows that a combination of a VICReg-style method and TNC is very effective for SSL on non-stationary time series, where a non-stationary seismic signal time series is used as an evaluation dataset.
    Data-Centric Green AI: An Exploratory Empirical Study. (arXiv:2204.02766v2 [cs.LG] UPDATED)
    With the growing availability of large-scale datasets, and the popularization of affordable storage and computational capabilities, the energy consumed by AI is becoming a growing concern. To address this issue, in recent years, studies have focused on demonstrating how AI energy efficiency can be improved by tuning the model training strategy. Nevertheless, how modifications applied to datasets can impact the energy consumption of AI is still an open question. To fill this gap, in this exploratory study, we evaluate if data-centric approaches can be utilized to improve AI energy efficiency. To achieve our goal, we conduct an empirical experiment, executed by considering 6 different AI algorithms, a dataset comprising 5,574 data points, and two dataset modifications (number of data points and number of features). Our results show evidence that, by exclusively conducting modifications on datasets, energy consumption can be drastically reduced (up to 92.16%), often at the cost of a negligible or even absent accuracy decline. As additional introductory results, we demonstrate how, by exclusively changing the algorithm used, energy savings up to two orders of magnitude can be achieved. In conclusion, this exploratory investigation empirically demonstrates the importance of applying data-centric techniques to improve AI energy efficiency. Our results call for a research agenda that focuses on data-centric techniques, to further enable and democratize Green AI.
    Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning. (arXiv:2108.03706v2 [stat.ML] UPDATED)
    The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm at statistical inference tasks across a range of real RL environments.
    Convergence and Optimality of Policy Gradient Methods in Weakly Smooth Settings. (arXiv:2111.00185v2 [cs.LG] UPDATED)
    Policy gradient methods have been frequently applied to problems in control and reinforcement learning with great success, yet existing convergence analysis still relies on non-intuitive, impractical and often opaque conditions. In particular, existing rates are achieved in limited settings, under strict regularity conditions. In this work, we establish explicit convergence rates of policy gradient methods, extending the convergence regime to weakly smooth policy classes with $L_2$ integrable gradient. We provide intuitive examples to illustrate the insight behind these new conditions. Notably, our analysis also shows that convergence rates are achievable for both the standard policy gradient and the natural policy gradient algorithms under these assumptions. Lastly we provide performance guarantees for the converged policies.
    Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred Objects in Videos. (arXiv:2111.14465v2 [cs.CV] UPDATED)
    We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video. To this end, we model the blurred appearance of a fast moving object in a generative fashion by parametrizing its 3D position, rotation, velocity, acceleration, bounces, shape, and texture over the duration of a predefined time window spanning multiple frames. Using differentiable rendering, we are able to estimate all parameters by minimizing the pixel-wise reprojection error to the input video via backpropagating through a rendering pipeline that accounts for motion blur by averaging the graphics output over short time intervals. For that purpose, we also estimate the camera exposure gap time within the same optimization. To account for abrupt motion changes like bounces, we model the motion trajectory as a piece-wise polynomial, and we are able to estimate the specific time of the bounce at sub-frame accuracy. Experiments on established benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.
    Federated Learning of Generative Image Priors for MRI Reconstruction. (arXiv:2202.04175v2 [eess.IV] UPDATED)
    Multi-institutional efforts can facilitate training of deep MRI reconstruction models, albeit privacy risks arise during cross-site sharing of imaging data. Federated learning (FL) has recently been introduced to address privacy concerns by enabling distributed training without transfer of imaging data. Existing FL methods for MRI reconstruction employ conditional models to map from undersampled to fully-sampled acquisitions via explicit knowledge of the imaging operator. Since conditional models generalize poorly across different acceleration rates or sampling densities, imaging operators must be fixed between training and testing, and they are typically matched across sites. To improve generalization and flexibility in multi-institutional collaborations, here we introduce a novel method for MRI reconstruction based on Federated learning of Generative IMage Priors (FedGIMP). FedGIMP leverages a two-stage approach: cross-site learning of a generative MRI prior, and subject-specific injection of the imaging operator. The global MRI prior is learned via an unconditional adversarial model that synthesizes high-quality MR images based on latent variables. Specificity in the prior is preserved via a mapper subnetwork that produces site-specific latents. During inference, the prior is combined with subject-specific imaging operators to enable reconstruction, and further adapted to individual test samples by minimizing data-consistency loss. Comprehensive experiments on multi-institutional datasets clearly demonstrate enhanced generalization performance of FedGIMP against site-specific and federated methods based on conditional models, as well as traditional reconstruction methods.
    Machine Learning based Medical Image Deepfake Detection: A Comparative Study. (arXiv:2109.12800v2 [cs.CV] UPDATED)
    Deep generative networks in recent years have reinforced the need for caution while consuming various modalities of digital information. One avenue of deepfake creation is aligned with injection and removal of tumors from medical scans. Failure to detect medical deepfakes can lead to large setbacks on hospital resources or even loss of life. This paper attempts to address the detection of such attacks with a structured case study. Specifically, we evaluate eight different machine learning algorithms, which including three conventional machine learning methods, support vector machine, random forest, decision tree, and five deep learning models, DenseNet121, DenseNet201, ResNet50, ResNet101, VGG19, on distinguishing between tampered and untampered images.For deep learning models, the five models are used for feature extraction, then fine-tune for each pre-trained model is performed. The findings of this work show near perfect accuracy in detecting instances of tumor injections and removals.
    Reinforcement Learning with Almost Sure Constraints. (arXiv:2112.05198v2 [cs.LG] UPDATED)
    In this work we address the problem of finding feasible policies for Constrained Markov Decision Processes under probability one constraints. We argue that stationary policies are not sufficient for solving this problem, and that a rich class of policies can be found by endowing the controller with a scalar quantity, so called budget, that tracks how close the agent is to violating the constraint. We show that the minimal budget required to act safely can be obtained as the smallest fixed point of a Bellman-like operator, for which we analyze its convergence properties. We also show how to learn this quantity when the true kernel of the Markov decision process is not known, while providing sample-complexity bounds. The utility of knowing this minimal budget relies in that it can aid in the search of optimal or near-optimal policies by shrinking down the region of the state space the agent must navigate. Simulations illustrate the different nature of probability one constraints against the typically used constraints in expectation.
    Quantum Distributed Deep Learning Architectures: Models, Discussions, and Applications. (arXiv:2202.11200v3 [quant-ph] UPDATED)
    Although deep learning (DL) has already become a state-of-the-art technology for various data processing tasks, data security and computational overload problems often arise due to their high data and computational power dependency. To solve this problem, quantum deep learning (QDL) and distributed deep learning (DDL) has emerged to complement existing DL methods. Furthermore, a quantum distributed deep learning (QDDL) technique that combines and maximizes these advantages is getting attention. This paper compares several model structures for QDDL and discusses their possibilities and limitations to leverage QDDL for some representative application scenarios.
    Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry. (arXiv:2103.15783v2 [cs.LG] UPDATED)
    Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm, which uses spatially-regularized diffusion distances to efficiently and accurately learn multiple scales of latent structure in hyperspectral images. The M-SRDL clustering algorithm extracts clusterings at many scales from a hyperspectral image and outputs these clusterings' variation of information-barycenter as an exemplar for all underlying cluster structure. We show that incorporating spatial regularization into a multiscale clustering framework results in smoother and more coherent clusters when applied to hyperspectral data, yielding more accurate clustering labels.
    SplitAVG: A heterogeneity-aware federated deep learning method for medical imaging. (arXiv:2107.02375v4 [cs.LG] UPDATED)
    Federated learning is an emerging research paradigm for enabling collaboratively training deep learning models without sharing patient data. However, the data from different institutions are usually heterogeneous across institutions, which may reduce the performance of models trained using federated learning. In this study, we propose a novel heterogeneity-aware federated learning method, SplitAVG, to overcome the performance drops from data heterogeneity in federated learning. Unlike previous federated methods that require complex heuristic training or hyper parameter tuning, our SplitAVG leverages the simple network split and feature map concatenation strategies to encourage the federated model training an unbiased estimator of the target data distribution. We compare SplitAVG with seven state-of-the-art federated learning methods, using centrally hosted training data as the baseline on a suite of both synthetic and real-world federated datasets. We find that the performance of models trained using all the comparison federated learning methods degraded significantly with the increasing degrees of data heterogeneity. In contrast, SplitAVG method achieves comparable results to the baseline method under all heterogeneous settings, that it achieves 96.2% of the accuracy and 110.4% of the mean absolute error obtained by the baseline in a diabetic retinopathy binary classification dataset and a bone age prediction dataset, respectively, on highly heterogeneous data partitions. We conclude that SplitAVG method can effectively overcome the performance drops from variability in data distributions across institutions. Experimental results also show that SplitAVG can be adapted to different base networks and generalized to various types of medical imaging tasks.
    GFlowNet Foundations. (arXiv:2111.09266v2 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.
    Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation. (arXiv:2111.14826v2 [cs.CV] UPDATED)
    The nonuniform quantization strategy for compressing neural networks usually achieves better performance than its counterpart, i.e., uniform strategy, due to its superior representational capacity. However, many nonuniform quantization methods overlook the complicated projection process in implementing the nonuniformly quantized weights/activations, which incurs non-negligible time and space overhead in hardware deployment. In this study, we propose Nonuniform-to-Uniform Quantization (N2UQ), a method that can maintain the strong representation ability of nonuniform methods while being hardware-friendly and efficient as the uniform quantization for model inference. We achieve this through learning the flexible in-equidistant input thresholds to better fit the underlying distribution while quantizing these real-valued inputs into equidistant output levels. To train the quantized network with learnable input thresholds, we introduce a generalized straight-through estimator (G-STE) for intractable backward derivative calculation w.r.t. threshold parameters. Additionally, we consider entropy preserving regularization to further reduce information loss in weight quantization. Even under this adverse constraint of imposing uniformly quantized weights and activations, our N2UQ outperforms state-of-the-art nonuniform quantization methods by 0.5~1.7 on ImageNet, demonstrating the contribution of N2UQ design. Code and models are available at: https://github.com/liuzechun/Nonuniform-to-Uniform-Quantization.
    Margin Calibration for Long-Tailed Visual Recognition. (arXiv:2112.07225v2 [cs.CV] UPDATED)
    The long-tailed class distribution in visual recognition tasks poses great challenges for neural networks on how to handle the biased predictions between head and tail classes, i.e., the model tends to classify tail classes as head classes. While existing research focused on data resampling and loss function engineering, in this paper, we take a different perspective: the classification margins. We study the relationship between the margins and logits (classification scores) and empirically observe the biased margins and the biased logits are positively correlated. We propose MARC, a simple yet effective MARgin Calibration function to dynamically calibrate the biased margins for unbiased logits. We validate MARC through extensive experiments on common long-tailed benchmarks including CIFAR-LT, ImageNet-LT, Places-LT, and iNaturalist-LT. Experimental results demonstrate that our MARC achieves favorable results on these benchmarks. In addition, MARC is extremely easy to implement with just three lines of code. We hope this simple method will motivate people to rethink the biased margins and biased logits in long-tailed visual recognition.
    Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. (arXiv:2202.06934v3 [cs.CV] UPDATED)
    Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .
    Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation. (arXiv:2203.05774v2 [eess.SY] UPDATED)
    In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the attacker misleads the batch RL into learning the 'nefarious' policy that leads the vehicle to a dangerous position. The attacker can also gradually trick the ADP learner into learning the same `nefarious' policy by consistently feeding the learner a falsified cost signal that stays close to the actual cost signal. The paper aims to raise people's awareness of the security threats faced by RL-enabled control systems.
    Membership Inference Attacks Against Self-supervised Speech Models. (arXiv:2111.05113v2 [cs.CR] UPDATED)
    Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.
    Scientific Discovery and the Cost of Measurement -- Balancing Information and Cost in Reinforcement Learning. (arXiv:2112.07535v2 [cs.LG] UPDATED)
    The use of reinforcement learning (RL) in scientific applications, such as materials design and automated chemistry, is increasing. A major challenge, however, lies in fact that measuring the state of the system is often costly and time consuming in scientific applications, whereas policy learning with RL requires a measurement after each time step. In this work, we make the measurement costs explicit in the form of a costed reward and propose a framework that enables off-the-shelf deep RL algorithms to learn a policy for both selecting actions and determining whether or not to measure the current state of the system at each time step. In this way, the agents learn to balance the need for information with the cost of information. Our results show that when trained under this regime, the Dueling DQN and PPO agents can learn optimal action policies whilst making up to 50\% fewer state measurements, and recurrent neural networks can produce a greater than 50\% reduction in measurements. We postulate the these reduction can help to lower the barrier to applying RL to real-world scientific applications.
    Control Theoretic Analysis of Temporal Difference Learning. (arXiv:2112.14417v4 [cs.AI] UPDATED)
    The goal of this paper is to investigate a control theoretic analysis of linear stochastic iterative algorithm and temporal difference (TD) learning. TD-learning is a linear stochastic iterative algorithm to estimate the value function of a given policy for a Markov decision process, which is one of the most popular and fundamental reinforcement learning algorithms. While there has been a series of successful works in theoretical analysis of TD-learning, it was not until recently that researchers found some guarantees on its statistical efficiency. In this paper, we propose a control theoretic finite-time analysis TD-learning, which exploits standard notions in linear system control communities. Therefore, the proposed work provides additional insights on TD-learning and reinforcement learning with simple concepts and analysis tools in control theory.
    Imitating, Fast and Slow: Robust learning from demonstrations via decision-time planning. (arXiv:2204.03597v1 [cs.LG])
    The goal of imitation learning is to mimic expert behavior from demonstrations, without access to an explicit reward signal. A popular class of approach infers the (unknown) reward function via inverse reinforcement learning (IRL) followed by maximizing this reward function via reinforcement learning (RL). The policies learned via these approaches are however very brittle in practice and deteriorate quickly even with small test-time perturbations due to compounding errors. We propose Imitation with Planning at Test-time (IMPLANT), a new meta-algorithm for imitation learning that utilizes decision-time planning to correct for compounding errors of any base imitation policy. In contrast to existing approaches, we retain both the imitation policy and the rewards model at decision-time, thereby benefiting from the learning signal of the two components. Empirically, we demonstrate that IMPLANT significantly outperforms benchmark imitation learning approaches on standard control environments and excels at zero-shot generalization when subject to challenging perturbations in test-time dynamics.
    Understanding Dynamics of Nonlinear Representation Learning and Its Application. (arXiv:2106.14836v3 [cs.LG] UPDATED)
    Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.
    Discriminability-enforcing loss to improve representation learning. (arXiv:2202.07073v2 [cs.CV] UPDATED)
    During the training process, deep neural networks implicitly learn to represent the input data samples through a hierarchy of features, where the size of the hierarchy is determined by the number of layers. In this paper, we focus on enforcing the discriminative power of the high-level representations, that are typically learned by the deeper layers (closer to the output). To this end, we introduce a new loss term inspired by the Gini impurity, which is aimed at minimizing the entropy (increasing the discriminative power) of individual high-level features with respect to the class labels. Although our Gini loss induces highly-discriminative features, it does not ensure that the distribution of the high-level features matches the distribution of the classes. As such, we introduce another loss term to minimize the Kullback-Leibler divergence between the two distributions. We conduct experiments on two image classification data sets (CIFAR-100 and Caltech 101), considering multiple neural architectures ranging from convolutional networks (ResNet-17, ResNet-18, ResNet-50) to transformers (CvT). Our empirical results show that integrating our novel loss terms into the training objective consistently outperforms the models trained with cross-entropy alone, without increasing the inference time at all.
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v1 [cs.LG])
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
    Deep learning method for identifying mass composition of ultra-high-energy cosmic rays. (arXiv:2112.02072v2 [astro-ph.IM] UPDATED)
    We introduce a novel method for identifying the mass composition of ultra-high-energy cosmic rays using deep learning. The key idea of the method is to use a chain of two neural networks. The first network predicts the type of a primary particle for individual events, while the second infers the mass composition of an ensemble of events. We apply this method to the Monte-Carlo data for the Telescope Array Surface Detectors readings, on which it yields an unprecedented low error of 7% for 4-component approximation. We also discuss the problems of applying the developed method to the experimental data, and the way they can be resolved.
    Learning and Transferring Value Function for Robot Exploration in Subterranean Environments. (arXiv:2204.03140v1 [cs.RO])
    In traditional robot exploration methods, the robot usually does not have prior biases about the environment it is exploring. Thus the robot assigns equal importance to the goals which leads to insufficient exploration efficiency. Alternative, often a hand-tuned policy is used to tweak the value of goals. In this paper, we present a method to learn how "good" some states are, measured by the state value function, to provide a hint for the robot to make exploration decisions. We propose to learn state value functions from previous offline collected datasets and then transfer and improve the value function during testing in a new environment. Moreover, the environments usually have very few and even no extrinsic reward or feedback for the robot. Therefore in this work, we also tackle the problem of sparse extrinsic rewards from the environments. We design several intrinsic rewards to encourage the robot to obtain more information during exploration. These reward functions then become the building blocks of the state value functions. We test our method on challenging subterranean and urban environments. To the best of our knowledge, this work for the first time demonstrates value function prediction with previous collected datasets to help exploration in challenging subterranean environments.
    Security Aspects of Quantum Machine Learning: Opportunities, Threats and Defenses. (arXiv:2204.03625v1 [cs.CR])
    In the last few years, quantum computing has experienced a growth spurt. One exciting avenue of quantum computing is quantum machine learning (QML) which can exploit the high dimensional Hilbert space to learn richer representations from limited data and thus can efficiently solve complex learning tasks. Despite the increased interest in QML, there have not been many studies that discuss the security aspects of QML. In this work, we explored the possible future applications of QML in the hardware security domain. We also expose the security vulnerabilities of QML and emerging attack models, and corresponding countermeasures.
    Heterogeneous Target Speech Separation. (arXiv:2204.03594v1 [cs.SD])
    We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation training with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.
    Covariate-assisted Sparse Tensor Completion. (arXiv:2103.06428v3 [stat.ML] UPDATED)
    We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.
    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation. (arXiv:2106.09179v2 [cs.LG] UPDATED)
    With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.
    DAIS: Automatic Channel Pruning via Differentiable Annealing Indicator Search. (arXiv:2011.02166v2 [cs.CV] UPDATED)
    The convolutional neural network has achieved great success in fulfilling computer vision tasks despite large computation overhead against efficient deployment. Structured (channel) pruning is usually applied to reduce the model redundancy while preserving the network structure, such that the pruned network can be easily deployed in practice. However, existing structured pruning methods require hand-crafted rules which may lead to tremendous pruning space. In this paper, we introduce Differentiable Annealing Indicator Search (DAIS) that leverages the strength of neural architecture search in the channel pruning and automatically searches for the effective pruned model with given constraints on computation overhead. Specifically, DAIS relaxes the binarized channel indicators to be continuous and then jointly learns both indicators and model parameters via bi-level optimization. To bridge the non-negligible discrepancy between the continuous model and the target binarized model, DAIS proposes an annealing-based procedure to steer the indicator convergence towards binarized states. Moreover, DAIS designs various regularizations based on a priori structural knowledge to control the pruning sparsity and to improve model performance. Experimental results show that DAIS outperforms state-of-the-art pruning methods on CIFAR-10, CIFAR-100, and ImageNet.
    Variational Autoencoder based Metamodeling for Multi-Objective Topology Optimization of Electrical Machines. (arXiv:2201.08877v2 [cs.LG] UPDATED)
    Conventional magneto-static finite element analysis of electrical machine design is time-consuming and computationally expensive. Since each machine topology has a distinct set of parameters, design optimization is commonly performed independently. This paper presents a novel method for predicting Key Performance Indicators (KPIs) of differently parameterized electrical machine topologies at the same time by mapping a high dimensional integrated design parameters in a lower dimensional latent space using a variational autoencoder. After training, via a latent space, the decoder and multi-layer neural network will function as meta-models for sampling new designs and predicting associated KPIs, respectively. This enables parameter-based concurrent multi-topology optimization.
    End-To-End Optimization of Online Neural Network-supported Two-Stage Dereverberation for Hearing Devices. (arXiv:2204.02978v1 [eess.AS])
    A two-stage online dereverberation algorithm for hearing devices is presented in this paper. The approach combines a multi-channel multi-frame linear filtering approach with a single-channel single-frame post-filter. Both components rely on power spectral density (PSD) estimates provided by deep neural networks (DNNs). This contribution extends our prior work, which shows that directly optimizing for a criterion at the output of the multi-channel linear filtering stage results in a more efficient dereverberation, as compared to placing the criterion at the output of the DNN to optimize the PSD estimation. In the present work, we show that the dereverberation performance of the proposed first stage particularly improves the early-to-mid reverberation ratio if trained end-to-end. We thus argue that it can be combined with a post-filtering stage which benefits from the early-to-mid ratio improvement and is consequently able to efficiently suppress the residual late reverberation. This proposed two stage procedure is shown to be both very effective in terms of dereverberation performance and computational demands. Furthermore, the proposed system can be adapted to the needs of different types of hearing-device users by controlling the amount of reduction of early reflections. The proposed system outperforms the previously proposed end-to-end DNN-supported linear filtering algorithm, as well as other traditional approaches, based on an evaluation using the noise-free version of the WHAMR! dataset.
    A comparison of mixed-variables Bayesian optimization approaches. (arXiv:2111.01533v2 [math.OC] UPDATED)
    Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.
    Concentration Network for Reinforcement Learning of Large-Scale Multi-Agent Systems. (arXiv:2203.06416v2 [cs.AI] UPDATED)
    When dealing with a series of imminent issues, humans can naturally concentrate on a subset of these concerning issues by prioritizing them according to their contributions to motivational indices, e.g., the probability of winning a game. This idea of concentration offers insights into reinforcement learning of sophisticated Large-scale Multi-Agent Systems (LMAS) participated by hundreds of agents. In such an LMAS, each agent receives a long series of entity observations at each step, which can overwhelm existing aggregation networks such as graph attention networks and cause inefficiency. In this paper, we propose a concentration network called ConcNet. First, ConcNet scores the observed entities considering several motivational indices, e.g., expected survival time and state value of the agents, and then ranks, prunes, and aggregates the encodings of observed entities to extract features. Second, distinct from the well-known attention mechanism, ConcNet has a unique motivational subnetwork to explicitly consider the motivational indices when scoring the observed entities. Furthermore, we present a concentration policy gradient architecture that can learn effective policies in LMAS from scratch. Extensive experiments demonstrate that the presented architecture has excellent scalability and flexibility, and significantly outperforms existing methods on LMAS benchmarks.  ( 2 min )
    On the Limitations of Multimodal VAEs. (arXiv:2110.04121v2 [cs.LG] UPDATED)
    Multimodal variational autoencoders (VAEs) have shown promise as efficient generative models for weakly-supervised data. Yet, despite their advantage of weak supervision, they exhibit a gap in generative quality compared to unimodal VAEs, which are completely unsupervised. In an attempt to explain this gap, we uncover a fundamental limitation that applies to a large family of mixture-based multimodal VAEs. We prove that the sub-sampling of modalities enforces an undesirable upper bound on the multimodal ELBO and thereby limits the generative quality of the respective models. Empirically, we showcase the generative quality gap on both synthetic and real data and present the tradeoffs between different variants of multimodal VAEs. We find that none of the existing approaches fulfills all desired criteria of an effective multimodal generative model when applied on more complex datasets than those used in previous benchmarks. In summary, we identify, formalize, and validate fundamental limitations of VAE-based approaches for modeling weakly-supervised data and discuss implications for real-world applications.
    Towards Improving Selective Prediction Ability of NLP Systems. (arXiv:2008.09371v3 [cs.CL] UPDATED)
    It's better to say "I can't answer" than to answer incorrectly. This selective prediction ability is crucial for NLP systems to be reliably deployed in real-world applications. Prior work has shown that existing selective prediction techniques fail to perform well, especially in the out-of-domain setting. In this work, we propose a method that improves probability estimates of models by calibrating them using prediction confidence and difficulty score of instances. Using these two signals, we first annotate held-out instances and then train a calibrator to predict the likelihood of correctness of the model's prediction. We instantiate our method with Natural Language Inference (NLI) and Duplicate Detection (DD) tasks and evaluate it in both In-Domain (IID) and Out-of-Domain (OOD) settings. In (IID, OOD) settings, we show that the representations learned by our calibrator result in an improvement of (15.81%, 5.64%) and (6.19%, 13.9%) over 'MaxProb' -- a selective prediction baseline -- on NLI and DD tasks respectively.
    Multi-Sample $\zeta$-mixup: Richer, More Realistic Synthetic Samples from a $p$-Series Interpolant. (arXiv:2204.03323v1 [cs.LG])
    Modern deep learning training procedures rely on model regularization techniques such as data augmentation methods, which generate training samples that increase the diversity of data and richness of label information. A popular recent method, mixup, uses convex combinations of pairs of original samples to generate new samples. However, as we show in our experiments, mixup can produce undesirable synthetic samples, where the data is sampled off the manifold and can contain incorrect labels. We propose $\zeta$-mixup, a generalization of mixup with provably and demonstrably desirable properties that allows convex combinations of $N \geq 2$ samples, leading to more realistic and diverse outputs that incorporate information from $N$ original samples by using a $p$-series interpolant. We show that, compared to mixup, $\zeta$-mixup better preserves the intrinsic dimensionality of the original datasets, which is a desirable property for training generalizable models. Furthermore, we show that our implementation of $\zeta$-mixup is faster than mixup, and extensive evaluation on controlled synthetic and 24 real-world natural and medical image classification datasets shows that $\zeta$-mixup outperforms mixup and traditional data augmentation techniques.
    Inference over radiative transfer models using variational and expectation maximization methods. (arXiv:2204.03346v1 [cs.LG])
    Earth observation from satellites offers the possibility to monitor our planet with unprecedented accuracy. Radiative transfer models (RTMs) encode the energy transfer through the atmosphere, and are used to model and understand the Earth system, as well as to estimate the parameters that describe the status of the Earth from satellite observations by inverse modeling. However, performing inference over such simulators is a challenging problem. RTMs are nonlinear, non-differentiable and computationally costly codes, which adds a high level of difficulty in inference. In this paper, we introduce two computational techniques to infer not only point estimates of biophysical parameters but also their joint distribution. One of them is based on a variational autoencoder approach and the second one is based on a Monte Carlo Expectation Maximization (MCEM) scheme. We compare and discuss benefits and drawbacks of each approach. We also provide numerical comparisons in synthetic simulations and the real PROSAIL model, a popular RTM that combines land vegetation leaf and canopy modeling. We analyze the performance of the two approaches for modeling and inferring the distribution of three key biophysical parameters for quantifying the terrestrial biosphere.
    Data Justice Stories: A Repository of Case Studies. (arXiv:2204.03100v1 [cs.CY])
    The idea of "data justice" is of recent academic vintage. It has arisen over the past decade in Anglo-European research institutions as an attempt to bring together a critique of the power dynamics that underlie accelerating trends of datafication with a normative commitment to the principles of social justice-a commitment to the achievement of a society that is equitable, fair, and capable of confronting the root causes of injustice.However, despite the seeming novelty of such a data justice pedigree, this joining up of the critique of the power imbalances that have shaped the digital and "big data" revolutions with a commitment to social equity and constructive societal transformation has a deeper historical, and more geographically diverse, provenance. As the stories of the data justice initiatives, activism, and advocacy contained in this volume well evidence, practices of data justice across the globe have, in fact, largely preceded the elaboration and crystallisation of the idea of data justice in contemporary academic discourse. In telling these data justice stories, we hope to provide the reader with two interdependent tools of data justice thinking: First, we aim to provide the reader with the critical leverage needed to discern those distortions and malformations of data justice that manifest in subtle and explicit forms of power, domination, and coercion. Second, we aim to provide the reader with access to the historically effective forms of normativity and ethical insight that have been marshalled by data justice activists and advocates as tools of societal transformation-so that these forms of normativity and insight can be drawn on, in turn, as constructive resources to spur future transformative data justice practices.
    Correcting Misproducted Speech using Spectrogram Inpainting. (arXiv:2204.03379v1 [eess.AS])
    Learning a new language involves constantly comparing speech productions with reference productions from the environment. Early in speech acquisition, children make articulatory adjustments to match their caregivers' speech. Grownup learners of a language tweak their speech to match the tutor reference. This paper proposes a method to synthetically generate correct pronunciation feedback given incorrect production. Furthermore, our aim is to generate the corrected production while maintaining the speaker's original voice. The system prompts the user to pronounce a phrase. The speech is recorded, and the samples associated with the inaccurate phoneme are masked with zeros. This waveform serves as an input to a speech generator, implemented as a deep learning inpainting system with a U-net architecture, and trained to output a reconstructed speech. The training set is composed of unimpaired proper speech examples, and the generator is trained to reconstruct the original proper speech. We evaluated the performance of our system on phoneme replacement of minimal pair words of English as well as on children with pronunciation disorders. Results suggest that human listeners slightly prefer our generated speech over a smoothed replacement of the inaccurate phoneme with a production of a different speaker.
    Accelerating Attention through Gradient-Based Learned Runtime Pruning. (arXiv:2204.03227v1 [cs.CL])
    Self-attention is a key enabler of state-of-art accuracy for various transformer-based Natural Language Processing models. This attention mechanism calculates a correlation score for each word with respect to the other words in a sentence. Commonly, only a small subset of words highly correlates with the word under attention, which is only determined at runtime. As such, a significant amount of computation is inconsequential due to low attention scores and can potentially be pruned. The main challenge is finding the threshold for the scores below which subsequent computation will be inconsequential. Although such a threshold is discrete, this paper formulates its search through a soft differentiable regularizer integrated into the loss function of the training. This formulation piggy backs on the back-propagation training to analytically co-optimize the threshold and the weights simultaneously, striking a formally optimal balance between accuracy and computation pruning. To best utilize this mathematical innovation, we devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism. We evaluate our design across 43 back-end tasks for MemN2N, BERT, ALBERT, GPT-2, and Vision transformer models. Post-layout results show that, on average, LeOPArd yields 1.9x and 3.9x speedup and energy reduction, respectively, while keeping the average accuracy virtually intact (<0.2% degradation)
    Pin the Memory: Learning to Generalize Semantic Segmentation. (arXiv:2204.03609v1 [cs.CV])
    The rise of deep neural networks has led to several breakthroughs for semantic segmentation. In spite of this, a model trained on source domain often fails to work properly in new challenging domains, that is directly concerned with the generalization capability of the model. In this paper, we present a novel memory-guided domain generalization method for semantic segmentation based on meta-learning framework. Especially, our method abstracts the conceptual knowledge of semantic classes into categorical memory which is constant beyond the domains. Upon the meta-learning concept, we repeatedly train memory-guided networks and simulate virtual test to 1) learn how to memorize a domain-agnostic and distinct information of classes and 2) offer an externally settled memory as a class-guidance to reduce the ambiguity of representation in the test data of arbitrary unseen domain. To this end, we also propose memory divergence and feature cohesion losses, which encourage to learn memory reading and update processes for category-aware domain generalization. Extensive experiments for semantic segmentation demonstrate the superior generalization capability of our method over state-of-the-art works on various benchmarks.
    Class-Incremental Learning with Strong Pre-trained Models. (arXiv:2204.03634v1 [cs.CV])
    Class-incremental learning (CIL) has been widely studied under the setting of starting from a small number of classes (base classes). Instead, we explore an understudied real-world setting of CIL that starts with a strong model pre-trained on a large number of base classes. We hypothesize that a strong base model can provide a good representation for novel classes and incremental learning can be done with small adaptations. We propose a 2-stage training scheme, i) feature augmentation -- cloning part of the backbone and fine-tuning it on the novel data, and ii) fusion -- combining the base and novel classifiers into a unified classifier. Experiments show that the proposed method significantly outperforms state-of-the-art CIL methods on the large-scale ImageNet dataset (e.g. +10% overall accuracy than the best). We also propose and analyze understudied practical CIL scenarios, such as base-novel overlap with distribution shift. Our proposed method is robust and generalizes to all analyzed CIL settings.
    GNNLens: A Visual Analytics Approach for Prediction Error Diagnosis of Graph Neural Networks. (arXiv:2011.11048v6 [cs.HC] UPDATED)
    Graph Neural Networks (GNNs) aim to extend deep learning techniques to graph data and have achieved significant progress in graph analysis tasks (e.g., node classification) in recent years. However, similar to other deep neural networks like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), GNNs behave like a black box with their details hidden from model developers and users. It is therefore difficult to diagnose possible errors of GNNs. Despite many visual analytics studies being done on CNNs and RNNs, little research has addressed the challenges for GNNs. This paper fills the research gap with an interactive visual analysis tool, GNNLens, to assist model developers and users in understanding and analyzing GNNs. Specifically, Parallel Sets View and Projection View enable users to quickly identify and validate error patterns in the set of wrong predictions; Graph View and Feature Matrix View offer a detailed analysis of individual nodes to assist users in forming hypotheses about the error patterns. Since GNNs jointly model the graph structure and the node features, we reveal the relative influences of the two types of information by comparing the predictions of three models: GNN, Multi-Layer Perceptron (MLP), and GNN Without Using Features (GNNWUF). Two case studies and interviews with domain experts demonstrate the effectiveness of GNNLens in facilitating the understanding of GNN models and their errors.
    Joint Adaptive Graph and Structured Sparsity Regularization for Unsupervised Feature Selection. (arXiv:2010.05454v3 [cs.LG] UPDATED)
    Feature selection is an important data preprocessing in data mining and machine learning which can be used to reduce the feature dimension without deteriorating model's performance. Since obtaining annotated data is laborious or even infeasible in many cases, unsupervised feature selection is more practical in reality. Though lots of methods for unsupervised feature selection have been proposed, these methods select features independently, thus it is no guarantee that the group of selected features is optimal. What's more, the number of selected features must be tuned carefully to obtain a satisfactory result. To tackle these problems, we propose a joint adaptive graph and structured sparsity regularization unsupervised feature selection (JASFS) method in this paper, in which a $l_{2,0}$-norm regularization term with respect to transformation matrix is imposed in the manifold learning for feature selection, and a graph regularization term is incorporated into the learning model to learn the local geometric structure of data adaptively. An efficient and simple iterative algorithm is designed to solve the proposed optimization problem with the analysis of computational complexity. After optimized, a subset of optimal features will be selected in group, and the number of selected features will be determined automatically. Experimental results on eight benchmarks demonstrate the effectiveness and efficiency of the proposed method compared with several state-of-the-art approaches.
    Fast Design Space Exploration of Nonlinear Systems: Part I. (arXiv:2104.01747v7 [cs.LG] UPDATED)
    System design tools are often only available as input-output blackboxes: for a given design as input they compute an output representing system behavior. Blackboxes are intended to be run in the forward direction. This paper presents a new method of solving the inverse design problem namely, given requirements or constraints on output, find an input that also optimizes an objective function. This problem is challenging for several reasons. First, blackboxes are not designed to be run in reverse. Second, inputs and outputs can be discrete and continuous. Third, finding designs concurrently satisfying a set of requirements is hard because designs satisfying individual requirements may conflict with each other. Fourth, blackbox evaluations can be expensive. Finally, blackboxes can sometimes fail to produce an output. This paper presents CNMA, a new method of solving the inverse problem that overcomes these challenges. CNMA tries to sample only the part of the design space relevant to solving the problem, leveraging the power of neural networks, Mixed Integer Linear Programs, and a new learning-from-failure feedback loop. The paper also presents a parallel version of CNMA that improves the efficiency and quality of solutions over the sequential version, and tries to steer it away from local optima. CNMA's performance is evaluated against conventional optimization methods for seven nonlinear design problems of 8 (two problems), 10, 15, 36 and 60 real-valued dimensions and one with 186 binary dimensions. Conventional methods evaluated are off-the-shelf implementations of Bayesian Optimization with Gaussian Processes, Nelder Mead and Random Search. The first two do not solve problems that are high-dimensional, have discrete and continuous variables or whose blackboxes can fail to return values. CNMA solves all problems, and surpasses the performance of conventional methods by up to 87%.
    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors. (arXiv:2204.03145v1 [stat.AP])
    DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.
    MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids. (arXiv:2204.03305v1 [eess.AS])
    Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.
    Evaluating Pre-Trained Models for User Feedback Analysis in Software Engineering: A Study on Classification of App-Reviews. (arXiv:2104.05861v3 [cs.SE] UPDATED)
    Context: Mobile app reviews written by users on app stores or social media are significant resources for app developers.Analyzing app reviews have proved to be useful for many areas of software engineering (e.g., requirement engineering, testing). Automatic classification of app reviews requires extensive efforts to manually curate a labeled dataset. When the classification purpose changes (e.g. identifying bugs versus usability issues or sentiment), new datasets should be labeled, which prevents the extensibility of the developed models for new desired classes/tasks in practice. Recent pre-trained neural language models (PTM) are trained on large corpora in an unsupervised manner and have found success in solving similar Natural Language Processing problems. However, the applicability of PTMs is not explored for app review classification Objective: We investigate the benefits of PTMs for app review classification compared to the existing models, as well as the transferability of PTMs in multiple settings. Method: We empirically study the accuracy and time efficiency of PTMs compared to prior approaches using six datasets from literature. In addition, we investigate the performance of the PTMs trained on app reviews (i.e. domain-specific PTMs) . We set up different studies to evaluate PTMs in multiple settings: binary vs. multi-class classification, zero-shot classification (when new labels are introduced to the model), multi-task setting, and classification of reviews from different resources. The datasets are manually labeled app review datasets from Google Play Store, Apple App Store, and Twitter data. In all cases, Micro and Macro Precision, Recall, and F1-scores will be used and we will report the time required for training and prediction with the models.
    Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. (arXiv:2004.10888v6 [cs.LG] UPDATED)
    We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.
    Differentially Private Set Union. (arXiv:2002.09745v2 [cs.CR] UPDATED)
    We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.
    Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention. (arXiv:2204.03479v1 [cs.CL])
    Multi-head self-attention forms the core of Transformer networks. However, their quadratically growing complexity with respect to the input sequence length impedes their deployment on resource-constrained edge devices. We address this challenge by proposing a dynamic pruning method, which exploits the temporal stability of data across tokens to reduce inference cost. The threshold-based method only retains significant differences between the subsequent tokens, effectively reducing the number of multiply-accumulates, as well as the internal tensor data sizes. The approach is evaluated on the Google Speech Commands Dataset for keyword spotting, and the performance is compared against the baseline Keyword Transformer. Our experiments show that we can reduce ~80% of operations while maintaining the original 98.4% accuracy. Moreover, a reduction of ~87-94% operations can be achieved when only degrading the accuracy by 1-4%, speeding up the multi-head self-attention inference by a factor of ~7.5-16.
    Contrastive Learning Inverts the Data Generating Process. (arXiv:2102.08850v4 [cs.LG] UPDATED)
    Contrastive learning has recently seen tremendous success in self-supervised learning. So far, however, it is largely unclear why the learned representations generalize so effectively to a large variety of downstream tasks. We here prove that feedforward models trained with objectives belonging to the commonly used InfoNCE family learn to implicitly invert the underlying generative model of the observed data. While the proofs make certain statistical assumptions about the generative model, we observe empirically that our findings hold even if these assumptions are severely violated. Our theory highlights a fundamental connection between contrastive learning, generative modeling, and nonlinear independent component analysis, thereby furthering our understanding of the learned representations as well as providing a theoretical foundation to derive more effective contrastive losses.
    Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes. (arXiv:2102.00135v6 [cs.LG] UPDATED)
    We present new policy mirror descent (PMD) methods for solving reinforcement learning (RL) problems with either strongly convex or general convex regularizers. By exploring the structural properties of these overall highly nonconvex problems we show that the PMD methods exhibit fast linear rate of convergence to the global optimality. We develop stochastic counterparts of these methods, and establish an ${\cal O}(1/\epsilon)$ (resp., ${\cal O}(1/\epsilon^2)$) sampling complexity for solving these RL problems with strongly (resp., general) convex regularizers using different sampling schemes, where $\epsilon$ denote the target accuracy. We further show that the complexity for computing the gradients of these regularizers, if necessary, can be bounded by ${\cal O}\{(\log_\gamma \epsilon) [(1-\gamma)L/\mu]^{1/2}\log (1/\epsilon)\}$ (resp., ${\cal O} \{(\log_\gamma \epsilon ) (L/\epsilon)^{1/2}\}$)for problems with strongly (resp., general) convex regularizers. Here $\gamma$ denotes the discounting factor. To the best of our knowledge, these complexity bounds, along with our algorithmic developments, appear to be new in both optimization and RL literature. The introduction of these convex regularizers also greatly expands the flexibility and applicability of RL models.
    Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers. (arXiv:2204.03503v1 [cs.CL])
    Automated short answer grading (ASAG) has gained attention in education as a means to scale educational tasks to the growing number of students. Recent progress in Natural Language Processing and Machine Learning has largely influenced the field of ASAG, of which we survey the recent research advancements. We complement previous surveys by providing a comprehensive analysis of recently published methods that deploy deep learning approaches. In particular, we focus our analysis on the transition from hand engineered features to representation learning approaches, which learn representative features for the task at hand automatically from large corpora of data. We structure our analysis of deep learning methods along three categories: word embeddings, sequential models, and attention-based methods. Deep learning impacted ASAG differently than other fields of NLP, as we noticed that the learned representations alone do not contribute to achieve the best results, but they rather show to work in a complementary way with hand-engineered features. The best performance are indeed achieved by methods that combine the carefully hand-engineered features with the power of the semantic descriptions provided by the latest models, like transformers architectures. We identify challenges and provide an outlook on research direction that can be addressed in the future
    An Overview on Artificial Intelligence Techniques for Diagnosis of Schizophrenia Based on Magnetic Resonance Imaging Modalities: Methods, Challenges, and Future Works. (arXiv:2103.03081v2 [cs.LG] UPDATED)
    Schizophrenia (SZ) is a mental disorder that typically emerges in late adolescence or early adulthood. It reduces the life expectancy of patients by 15 years. Abnormal behavior, perception of emotions, social relationships, and reality perception are among its most significant symptoms. Past studies have revealed the temporal and anterior lobes of hippocampus regions of brain get affected by SZ. Also, increased volume of cerebrospinal fluid (CSF) and decreased volume of white and gray matter can be observed due to this disease. The magnetic resonance imaging (MRI) is the popular neuroimaging technique used to explore structural/functional brain abnormalities in SZ disorder owing to its high spatial resolution. Various artificial intelligence (AI) techniques have been employed with advanced image/signal processing methods to obtain accurate diagnosis of SZ. This paper presents a comprehensive overview of studies conducted on automated diagnosis of SZ using MRI modalities. Main findings, various challenges, and future works in developing the automated SZ detection are described in this paper.
    Unified Contrastive Learning in Image-Text-Label Space. (arXiv:2204.03610v1 [cs.CV])
    Visual recognition is recently learned via either supervised learning on human-annotated image-label data or language-image contrastive learning with webly-crawled image-text pairs. While supervised learning may result in a more discriminative representation, language-image pretraining shows unprecedented zero-shot recognition capability, largely due to the different properties of data sources and learning objectives. In this work, we introduce a new formulation by combining the two data sources into a common image-text-label space. In this space, we propose a new learning paradigm, called Unified Contrastive Learning (UniCL) with a single learning objective to seamlessly prompt the synergy of two data types. Extensive experiments show that our UniCL is an effective way of learning semantically rich yet discriminative representations, universally for image recognition in zero-shot, linear-probe, fully finetuning and transfer learning scenarios. Particularly, it attains gains up to 9.2% and 14.5% in average on zero-shot recognition benchmarks over the language-image contrastive learning and supervised learning methods, respectively. In linear probe setting, it also boosts the performance over the two methods by 7.3% and 3.4%, respectively. Our study also indicates that UniCL stand-alone is a good learner on pure image-label data, rivaling the supervised learning methods across three image classification datasets and two types of vision backbones, ResNet and Swin Transformer. Code is available at https://github.com/microsoft/UniCL.
    A Pathology-Based Machine Learning Method to Assist in Epithelial Dysplasia Diagnosis. (arXiv:2204.03572v1 [eess.IV])
    The Epithelial Dysplasia (ED) is a tissue alteration commonly present in lesions preceding oral cancer, being its presence one of the most important factors in the progression toward carcinoma. This study proposes a method to design a low computational cost classification system to support the detection of dysplastic epithelia, contributing to reduce the variability of pathologist assessments. We employ a multilayer artificial neural network (MLP-ANN) and defining the regions of the epithelium to be assessed based on the knowledge of the pathologist. The performance of the proposed solution was statistically evaluated. The implemented MLP-ANN presented an average accuracy of 87%, with a variability much inferior to that obtained from three trained evaluators. Moreover, the proposed solution led to results which are very close to those obtained using a convolutional neural network (CNN) implemented by transfer learning, with 100 times less computational complexity. In conclusion, our results show that a simple neural network structure can lead to a performance equivalent to that of much more complex structures, which are routinely used in the literature.
    Equivariance Discovery by Learned Parameter-Sharing. (arXiv:2204.03640v1 [cs.LG])
    Designing equivariance as an inductive bias into deep-nets has been a prominent approach to build effective models, e.g., a convolutional neural network incorporates translation equivariance. However, incorporating these inductive biases requires knowledge about the equivariance properties of the data, which may not be available, e.g., when encountering a new domain. To address this, we study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. We propose to use the partition distance to empirically quantify the accuracy of the recovered equivariance. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme. Empirically, we show that the approach recovers known equivariances, such as permutations and shifts, on sum of numbers and spatially-invariant data.
    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. (arXiv:2002.02601v2 [stat.ML] UPDATED)
    Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.
    Interval Bound Propagation--aided Few-shot Learning. (arXiv:2204.03511v1 [cs.LG])
    Few-shot learning aims to transfer the knowledge acquired from training on a diverse set of tasks, from a given task distribution, to generalize to unseen tasks, from the same distribution, with a limited amount of labeled data. The underlying requirement for effective few-shot generalization is to learn a good representation of the task manifold. One way to encourage this is to preserve local neighborhoods in the feature space learned by the few-shot learner. To this end, we introduce the notion of interval bounds from the provably robust training literature to few-shot learning. The interval bounds are used to characterize neighborhoods around the training tasks. These neighborhoods can then be preserved by minimizing the distance between a task and its respective bounds. We further introduce a novel strategy to artificially form new tasks for training by interpolating between the available tasks and their respective interval bounds, to aid in cases with a scarcity of tasks. We apply our framework to both model-agnostic meta-learning as well as prototype-based metric-learning paradigms. The efficacy of our proposed approach is evident from the improved performance on several datasets from diverse domains in comparison to a sizable number of recent competitors.
    An optimized hybrid solution for IoT based lifestyle disease classification using stress data. (arXiv:2204.03573v1 [eess.SP])
    Stress, anxiety, and nervousness are all high-risk health states in everyday life. Previously, stress levels were determined by speaking with people and gaining insight into what they had experienced recently or in the past. Typically, stress is caused by an incidence that occurred a long time ago, but sometimes it is triggered by unknown factors. This is a challenging and complex task, but recent research advances have provided numerous opportunities to automate it. The fundamental features of most of these techniques are electro dermal activity (EDA) and heart rate values (HRV). We utilized an accelerometer to measure body motions to solve this challenge. The proposed novel method employs a test that measures a subject's electrocardiogram (ECG), galvanic skin values (GSV), HRV values, and body movements in order to provide a low-cost and time-saving solution for detecting stress lifestyle disease in modern times using cyber physical systems. This study provides a new hybrid model for lifestyle disease classification that decreases execution time while picking the best collection of characteristics and increases classification accuracy. The developed approach is capable of dealing with the class imbalance problem by using WESAD (wearable stress and affect dataset) dataset. The new model uses the Grid search (GS) method to select an optimized set of hyper parameters, and it uses a combination of the Correlation coefficient based Recursive feature elimination (CoC-RFE) method for optimal feature selection and gradient boosting as an estimator to classify the dataset, which achieves high accuracy and helps to provide smart, accurate, and high-quality healthcare systems. To demonstrate the validity and utility of the proposed methodology, its performance is compared to those of other well-established machine learning models.
    RL-QN: A Reinforcement Learning Framework for Optimal Control of Queueing Systems. (arXiv:2011.07401v2 [cs.PF] UPDATED)
    With the rapid advance of information technology, network systems have become increasingly complex and hence the underlying system dynamics are often unknown or difficult to characterize. Finding a good network control policy is of significant importance to achieve desirable network performance (e.g., high throughput or low delay). In this work, we consider using model-based reinforcement learning (RL) to learn the optimal control policy for queueing networks so that the average job delay (or equivalently the average queue backlog) is minimized. Traditional approaches in RL, however, cannot handle the unbounded state spaces of the network control problem. To overcome this difficulty, we propose a new algorithm, called Reinforcement Learning for Queueing Networks (RL-QN), which applies model-based RL methods over a finite subset of the state space, while applying a known stabilizing policy for the rest of the states. We establish that the average queue backlog under RL-QN with an appropriately constructed subset can be arbitrarily close to the optimal result. We evaluate RL-QN in dynamic server allocation, routing and switching problems. Simulation results show that RL-QN minimizes the average queue backlog effectively.
    Position-based Prompting for Health Outcome Generation. (arXiv:2204.03489v1 [cs.CL])
    Probing Pre-trained Language Models (PLMs) using prompts has indirectly implied that language models (LMs) can be treated as knowledge bases. To this end, this phenomena has been effective especially when these LMs are fine-tuned towards not just data of a specific domain, but also to the style or linguistic pattern of the prompts themselves. We observe that, satisfying a particular linguistic pattern in prompts is an unsustainable constraint that unnecessarily lengthens the probing task, especially because, they are often manually designed and the range of possible prompt template patterns can vary depending on the prompting objective and domain. We therefore explore an idea of using a position-attention mechanism to capture positional information of each word in a prompt relative to the mask to be filled, hence avoiding the need to re-construct prompts when the prompts linguistic pattern changes. Using our approach, we demonstrate the ability of eliciting answers to rare prompt templates (in a case study on health outcome generation) such as Postfix and Mixed patterns whose missing information is respectively at the start and in multiple random places of the prompt. More so, using various biomedical PLMs, our approach consistently outperforms a baseline in which the default mask language model (MLM) representation is used to predict masked tokens.
    Modeling Label Correlations for Second-Order Semantic Dependency Parsing with Mean-Field Inference. (arXiv:2204.03619v1 [cs.CL])
    Second-order semantic parsing with end-to-end mean-field inference has been shown good performance. In this work we aim to improve this method by modeling label correlations between adjacent arcs. However, direct modeling leads to memory explosion because second-order score tensors have sizes of $O(n^3L^2)$ ($n$ is the sentence length and $L$ is the number of labels), which is not affordable. To tackle this computational challenge, we leverage tensor decomposition techniques, and interestingly, we show that the large second-order score tensors have no need to be materialized during mean-field inference, thereby reducing the computational complexity from cubic to quadratic. We conduct experiments on SemEval 2015 Task 18 English datasets, showing the effectiveness of modeling label correlations. Our code is publicly available at https://github.com/sustcsonglin/mean-field-dep-parsing.
    FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement. (arXiv:2204.03174v1 [cs.LG])
    As an emerging technology, federated learning (FL) involves training machine learning models over distributed edge devices, which attracts sustained attention and has been extensively studied. However, the heterogeneity of client data severely degrades the performance of FL compared with that in centralized training. It causes the locally trained models of clients to move in different directions. On the one hand, it slows down or even stalls the global updates, leading to inefficient communication. On the other hand, it enlarges the distances between local models, resulting in an aggregated global model with poor performance. Fortunately, these shortcomings can be mitigated by reducing the angle between the directions that local models move in. Based on this fact, we propose FedCos, which reduces the directional inconsistency of local models by introducing a cosine-similarity penalty. It promotes the local model iterations towards an auxiliary global direction. Moreover, our approach is auto-adapt to various non-IID settings without an elaborate selection of hyperparameters. The experimental results show that FedCos outperforms the well-known baselines and can enhance them under a variety of FL scenes, including varying degrees of data heterogeneity, different number of participants, and cross-silo and cross-device settings. Besides, FedCos improves communication efficiency by 2 to 5 times. With the help of FedCos, multiple FL methods require significantly fewer communication rounds than before to obtain a model with comparable performance.
    Distributed NLI: Learning to Predict Human Opinion Distributions for Language Reasoning. (arXiv:2104.08676v2 [cs.CL] UPDATED)
    We introduce distributed NLI, a new NLU task with a goal to predict the distribution of human judgements for natural language inference. We show that by applying additional distribution estimation methods, namely, Monte Carlo (MC) Dropout, Deep Ensemble, Re-Calibration, and Distribution Distillation, models can capture human judgement distribution more effectively than the softmax baseline. We show that MC Dropout is able to achieve decent performance without any distribution annotations while Re-Calibration can give further improvements with extra distribution annotations, suggesting the value of multiple annotations for one example in modeling the distribution of human judgements. Despite these improvements, the best results are still far below the estimated human upper-bound, indicating that predicting the distribution of human judgements is still an open, challenging problem with a large room for improvements. We showcase the common errors for MC Dropout and Re-Calibration. Finally, we give guidelines on the usage of these methods with different levels of data availability and encourage future work on modeling the human opinion distribution for language reasoning. Our code and data are publicly available at https://github.com/easonnie/ChaosNLI
    Statistical Model Criticism of Variational Auto-Encoders. (arXiv:2204.03030v1 [cs.LG])
    We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.
    Improving Urban Mobility: using artificial intelligence and new technologies to connect supply and demand. (arXiv:2204.03570v1 [cs.CY])
    As the demand for mobility in our society seems to increase, the various issues centered on urban mobility are among those that worry most city inhabitants in this planet. For instance, how to go from A to B in an efficient (but also less stressful) way? These questions and concerns have not changed even during the covid-19 pandemic; on the contrary, as the current stand, people who are avoiding public transportation are only contributing to an increase in the vehicular traffic. The are of intelligent transportation systems (ITS) aims at investigating how to employ information and communication technologies to problems related to transportation. This may mean monitoring and managing the infrastructure (e.g., traffic roads, traffic signals, etc.). However, currently, ITS is also targeting the management of demand. In this panorama, artificial intelligence plays an important role, especially with the advances in machine learning that translates in the use of computational vision, connected and autonomous vehicles, agent-based simulation, among others. In the present work, a survey of several works developed by our group are discussed in a holistic perspective, i.e., they cover not only the supply side (as commonly found in ITS works), but also the demand side, and, in an novel perspective, the integration of both.
    Neural Implicit Flow: a mesh-agnostic dimensionality reduction paradigm of spatio-temporal data. (arXiv:2204.03216v1 [cs.LG])
    High-dimensional spatio-temporal dynamics can often be encoded in a low-dimensional subspace. Engineering applications for modeling, characterization, design, and control of such large-scale systems often rely on dimensionality reduction to make solutions computationally tractable in real-time. Common existing paradigms for dimensionality reduction include linear methods, such as the singular value decomposition (SVD), and nonlinear methods, such as variants of convolutional autoencoders (CAE). However, these encoding techniques lack the ability to efficiently represent the complexity associated with spatio-temporal data, which often requires variable geometry, non-uniform grid resolution, adaptive meshing, and/or parametric dependencies.To resolve these practical engineering challenges, we propose a general framework called Neural Implicit Flow (NIF) that enables a mesh-agnostic, low-rank representation of large-scale, parametric, spatial-temporal data. NIF consists of two modified multilayer perceptrons (MLPs): (i) ShapeNet, which isolates and represents the spatial complexity, and (ii) ParameterNet, which accounts for any other input complexity, including parametric dependencies, time, and sensor measurements. We demonstrate the utility of NIF for parametric surrogate modeling, enabling the interpretable representation and compression of complex spatio-temporal dynamics, efficient many-spatial-query tasks, and improved generalization performance for sparse reconstruction.
    Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results. (arXiv:2204.03475v1 [cs.CV])
    ImageNet serves as the primary dataset for evaluating the quality of computer-vision models. The common practice today is training each architecture with a tailor-made scheme, designed and tuned by an expert. In this paper, we present a unified scheme for training any backbone on ImageNet. The scheme, named USI (Unified Scheme for ImageNet), is based on knowledge distillation and modern tricks. It requires no adjustments or hyper-parameters tuning between different models, and is efficient in terms of training times. We test USI on a wide variety of architectures, including CNNs, Transformers, Mobile-oriented and MLP-only. On all models tested, USI outperforms previous state-of-the-art results. Hence, we are able to transform training on ImageNet from an expert-oriented task to an automatic seamless routine. Since USI accepts any backbone and trains it to top results, it also enables to perform methodical comparisons, and identify the most efficient backbones along the speed-accuracy Pareto curve. Implementation is available at:https://github.com/Alibaba-MIIL/Solving_ImageNet
    Optimizing the Long-Term Behaviour of Deep Reinforcement Learning for Pushing and Grasping. (arXiv:2204.03487v1 [cs.LG])
    We investigate the "Visual Pushing for Grasping" (VPG) system by Zeng et al. and the "Hourglass" system by Ewerton et al., an evolution of the former. The focus of our work is the investigation of the capabilities of both systems to learn long-term rewards and policies. Zeng et al. original task only needs a limited amount of foresight. Ewerton et al. attain their best performance using an agent which only takes the most immediate action under consideration. We are interested in the ability of their models and training algorithms to accurately predict long-term Q-Values. To evaluate this ability, we design a new bin sorting task and reward function. Our task requires agents to accurately estimate future rewards and therefore use high discount factors in their Q-Value calculation. We investigate the behaviour of an adaptation of the VPG training algorithm on our task. We show that this adaptation can not accurately predict the required long-term action sequences. In addition to the limitations identified by Ewerton et al., it suffers from the known Deep Q-Learning problem of overestimated Q-Values. In an effort to solve our task, we turn to the Hourglass models and combine them with the Double Q-Learning approach. We show that this approach enables the models to accurately predict long-term action sequences when trained with large discount factors. Our results show that the Double Q-Learning technique is essential for training with very high discount factors, as the models Q-Value predictions diverge otherwise. We also experiment with different approaches for discount factor scheduling, loss calculation and exploration procedures. Our results show that the latter factors do not visibly influence the model's performance for our task.
    Learning to Compose Soft Prompts for Compositional Zero-Shot Learning. (arXiv:2204.03574v1 [cs.LG])
    We introduce compositional soft prompting (CSP), a parameter-efficient learning technique to improve the zero-shot compositionality of large-scale pretrained vision-language models (VLMs) without the overhead of fine-tuning the entire model. VLMs can represent arbitrary classes as natural language prompts in their flexible text encoders but they underperform state-of-the-art methods on compositional zero-shot benchmark tasks. To improve VLMs, we propose a novel form of soft prompting. We treat the attributes and objects that are composed to define classes as learnable tokens of vocabulary and tune them on multiple prompt compositions. During inference, we recompose the learned attribute-object vocabulary in new combinations and show that CSP outperforms the original VLM on benchmark datasets by an average of 14.7 percentage points of accuracy. CSP also achieves new state-of-the-art accuracies on two out of three benchmark datasets, while only fine-tuning a small number of parameters. Further, we show that CSP improves generalization to higher-order attribute-attribute-object compositions and combinations of pretrained attributes and fine-tuned objects.
    Risk-based regulation for all: The need and a method for a wide adoption solution for data-driven inspection targeting. (arXiv:2204.03583v1 [cs.LG])
    Access to data and data processing, including the use of machine learning techniques, has become significantly easier and cheaper in recent years. Nevertheless, solutions that can be widely adopted by regulators for market monitoring and inspection targeting in a data-driven way have not been frequently discussed by the scientific community. This article discusses the need and the difficulties for the development of such solutions, presents an effective method to address regulation planning, and illustrates its use to account for the most important and common subject for the majority of regulators: the consumer. This article hopes to contribute to increase the awareness of the regulatory community to the need for data processing methods that are objective, impartial, transparent, explainable, simple to implement and with low computational cost, aiming to the implementation of risk-based regulation in the world.
    BERTuit: Understanding Spanish language in Twitter through a native transformer. (arXiv:2204.03465v1 [cs.CL])
    The appearance of complex attention-based language models such as BERT, Roberta or GPT-3 has allowed to address highly complex tasks in a plethora of scenarios. However, when applied to specific domains, these models encounter considerable difficulties. This is the case of Social Networks such as Twitter, an ever-changing stream of information written with informal and complex language, where each message requires careful evaluation to be understood even by humans given the important role that context plays. Addressing tasks in this domain through Natural Language Processing involves severe challenges. When powerful state-of-the-art multilingual language models are applied to this scenario, language specific nuances use to get lost in translation. To face these challenges we present \textbf{BERTuit}, the larger transformer proposed so far for Spanish language, pre-trained on a massive dataset of 230M Spanish tweets using RoBERTa optimization. Our motivation is to provide a powerful resource to better understand Spanish Twitter and to be used on applications focused on this social network, with special emphasis on solutions devoted to tackle the spreading of misinformation in this platform. BERTuit is evaluated on several tasks and compared against M-BERT, XLM-RoBERTa and XLM-T, very competitive multilingual transformers. The utility of our approach is shown with applications, in this case: a zero-shot methodology to visualize groups of hoaxes and profiling authors spreading disinformation. Misinformation spreads wildly on platforms such as Twitter in languages other than English, meaning performance of transformers may suffer when transferred outside English speaking communities.
    Learning to Solve Travelling Salesman Problem with Hardness-adaptive Curriculum. (arXiv:2204.03236v1 [cs.LG])
    Various neural network models have been proposed to tackle combinatorial optimization problems such as the travelling salesman problem (TSP). Existing learning-based TSP methods adopt a simple setting that the training and testing data are independent and identically distributed. However, the existing literature fails to solve TSP instances when training and testing data have different distributions. Concretely, we find that different training and testing distribution will result in more difficult TSP instances, i.e., the solution obtained by the model has a large gap from the optimal solution. To tackle this problem, in this work, we study learning-based TSP methods when training and testing data have different distributions using adaptive-hardness, i.e., how difficult a TSP instance can be for a solver. This problem is challenging because it is non-trivial to (1) define hardness measurement quantitatively; (2) efficiently and continuously generate sufficiently hard TSP instances upon model training; (3) fully utilize instances with different levels of hardness to learn a more powerful TSP solver. To solve these challenges, we first propose a principled hardness measurement to quantify the hardness of TSP instances. Then, we propose a hardness-adaptive generator to generate instances with different hardness. We further propose a curriculum learner fully utilizing these instances to train the TSP solver. Experiments show that our hardness-adaptive generator can generate instances ten times harder than the existing methods, and our proposed method achieves significant improvement over state-of-the-art models in terms of the optimality gap.
    What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning. (arXiv:2204.03230v1 [cs.LG])
    We investigate and leverage a connection between Differential Privacy (DP) and the recently proposed notion of Distributional Generalization (DG). Applying this connection, we introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard stochastic gradient descent (SGD). First, we prove that differentially private methods satisfy a "What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a model does on its train data is almost exactly what it will do at test time. This guarantee is formally captured by distributional generalization. WYSIWYG enables principled algorithm design in deep learning by reducing $\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the train data. This is notably false for standard (non-DP) methods, hence this observation has applications even when privacy is not required. For example, importance sampling is known to fail for standard SGD, but we show that it has exactly the intended effect for DP-trained models. Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by making principled train-time interventions. We use these insights to construct simple algorithms which match or outperform SOTA in several distributional robustness applications, and to significantly improve the privacy vs. disparate impact trade-off of DP-SGD. Finally, we also improve on known theoretical bounds relating differential privacy, stability, and distributional generalization.
    Generalised Latent Assimilation in Heterogeneous Reduced Spaces with Machine Learning Surrogate Models. (arXiv:2204.03497v1 [cs.LG])
    Reduced-order modelling and low-dimensional surrogate models generated using machine learning algorithms have been widely applied in high-dimensional dynamical systems to improve the algorithmic efficiency. In this paper, we develop a system which combines reduced-order surrogate models with a novel data assimilation (DA) technique used to incorporate real-time observations from different physical spaces. We make use of local smooth surrogate functions which link the space of encoded system variables and the one of current observations to perform variational DA with a low computational cost. The new system, named Generalised Latent Assimilation can benefit both the efficiency provided by the reduced-order modelling and the accuracy of data assimilation. A theoretical analysis of the difference between surrogate and original assimilation cost function is also provided in this paper where an upper bound, depending on the size of the local training set, is given. The new approach is tested on a high-dimensional CFD application of a two-phase liquid flow with non-linear observation operators that current Latent Assimilation methods can not handle. Numerical results demonstrate that the proposed assimilation approach can significantly improve the reconstruction and prediction accuracy of the deep learning surrogate model which is nearly 1000 times faster than the CFD simulation.
    Explicit Feature Interaction-aware Graph Neural Networks. (arXiv:2204.03225v1 [cs.LG])
    Graph neural networks are powerful methods to handle graph-structured data. However, existing graph neural networks only learn higher-order feature interactions implicitly. Thus, they cannot capture information that occurred in low-order feature interactions. To overcome this problem, we propose Explicit Feature Interaction-aware Graph Neural Network (EFI-GNN), which explicitly learns arbitrary-order feature interactions. EFI-GNN can jointly learn with any other graph neural network. We demonstrate that the joint learning method always enhances performance on the various node classification tasks. Furthermore, since EFI-GNN is inherently a linear model, we can interpret the prediction result of EFI-GNN. With the computation rule, we can obtain an any-order feature's effect on the decision. By that, we visualize the effects of the first-order and second-order features as a form of a heatmap.
    Half-sibling regression meets exoplanet imaging: PSF modeling and subtraction using a flexible, domain knowledge-driven, causal framework. (arXiv:2204.03439v1 [astro-ph.IM])
    High-contrast imaging of exoplanets hinges on powerful post-processing methods to denoise the data and separate the signal of a companion from its host star, which is typically orders of magnitude brighter. Existing post-processing algorithms do not use all prior domain knowledge that is available about the problem. We propose a new method that builds on our understanding of the systematic noise and the causal structure of the data-generating process. Our algorithm is based on a modified version of half-sibling regression (HSR), a flexible denoising framework that combines ideas from the fields of machine learning and causality. We adapt the method to address the specific requirements of high-contrast exoplanet imaging data obtained in pupil tracking mode. The key idea is to estimate the systematic noise in a pixel by regressing the time series of this pixel onto a set of causally independent, signal-free predictor pixels. We use regularized linear models in this work; however, other (non-linear) models are also possible. In a second step, we demonstrate how the HSR framework allows us to incorporate observing conditions such as wind speed or air temperature as additional predictors. When we apply our method to four data sets from the VLT/NACO instrument, our algorithm provides a better false-positive fraction than PCA-based PSF subtraction, a popular baseline method in the field. Additionally, we find that the HSR-based method provides direct and accurate estimates for the contrast of the exoplanets without the need to insert artificial companions for calibration in the data sets. Finally, we present first evidence that using the observing conditions as additional predictors can improve the results. Our HSR-based method provides an alternative, flexible and promising approach to the challenge of modeling and subtracting the stellar PSF and systematic noise in exoplanet imaging data.
    mulEEG: A Multi-View Representation Learning on EEG Signals. (arXiv:2204.03272v1 [cs.LG])
    Modeling effective representations using multiple views that positively influence each other is challenging, and the existing methods perform poorly on Electroencephalogram (EEG) signals for sleep-staging tasks. In this paper, we propose a novel multi-view self-supervised method (mulEEG) for unsupervised EEG representation learning. Our method attempts to effectively utilize the complementary information available in multiple views to learn better representations. We introduce diverse loss that further encourages complementary information across multiple views. Our method with no access to labels beats the supervised training while outperforming multi-view baseline methods on transfer learning experiments carried out on sleep-staging tasks. We posit that our method was able to learn better representations by using complementary multi-views.
    Perceive, Represent, Generate: Translating Multimodal Information to Robotic Motion Trajectories. (arXiv:2204.03051v1 [cs.RO])
    We present Perceive-Represent-Generate (PRG), a novel three-stage framework that maps perceptual information of different modalities (e.g., visual or sound), corresponding to a sequence of instructions, to an adequate sequence of movements to be executed by a robot. In the first stage, we perceive and pre-process the given inputs, isolating individual commands from the complete instruction provided by a human user. In the second stage we encode the individual commands into a multimodal latent space, employing a deep generative model. Finally, in the third stage we convert the multimodal latent values into individual trajectories and combine them into a single dynamic movement primitive, allowing its execution in a robotic platform. We evaluate our pipeline in the context of a novel robotic handwriting task, where the robot receives as input a word through different perceptual modalities (e.g., image, sound), and generates the corresponding motion trajectory to write it, creating coherent and readable handwritten words.
    DiffCloud: Real-to-Sim from Point Clouds with Differentiable Simulation and Rendering of Deformable Objects. (arXiv:2204.03139v1 [cs.RO])
    Research in manipulation of deformable objects is typically conducted on a limited range of scenarios, because handling each scenario on hardware takes significant effort. Realistic simulators with support for various types of deformations and interactions have the potential to speed up experimentation with novel tasks and algorithms. However, for highly deformable objects it is challenging to align the output of a simulator with the behavior of real objects. Manual tuning is not intuitive, hence automated methods are needed. We view this alignment problem as a joint perception-inference challenge and demonstrate how to use recent neural network architectures to successfully perform simulation parameter inference from real point clouds. We analyze the performance of various architectures, comparing their data and training requirements. Furthermore, we propose to leverage differentiable point cloud sampling and differentiable simulation to significantly reduce the time to achieve the alignment. We employ an efficient way to propagate gradients from point clouds to simulated meshes and further through to the physical simulation parameters, such as mass and stiffness. Experiments with highly deformable objects show that our method can achieve comparable or better alignment with real object behavior, while reducing the time needed to achieve this by more than an order of magnitude. Videos and supplementary material are available at https://tinyurl.com/diffcloud.
    Machine Learning-Enabled IoT Security: Open Issues and Challenges Under Advanced Persistent Threats. (arXiv:2204.03433v1 [cs.CR])
    Despite its technological benefits, Internet of Things (IoT) has cyber weaknesses due to the vulnerabilities in the wireless medium. Machine learning (ML)-based methods are widely used against cyber threats in IoT networks with promising performance. Advanced persistent threat (APT) is prominent for cybercriminals to compromise networks, and it is crucial to long-term and harmful characteristics. However, it is difficult to apply ML-based approaches to identify APT attacks to obtain a promising detection performance due to an extremely small percentage among normal traffic. There are limited surveys to fully investigate APT attacks in IoT networks due to the lack of public datasets with all types of APT attacks. It is worth to bridge the state-of-the-art in network attack detection with APT attack detection in a comprehensive review article. This survey article reviews the security challenges in IoT networks and presents the well-known attacks, APT attacks, and threat models in IoT systems. Meanwhile, signature-based, anomaly-based, and hybrid intrusion detection systems are summarized for IoT networks. The article highlights statistical insights regarding frequently applied ML-based methods against network intrusion alongside the number of attacks types detected. Finally, open issues and challenges for common network intrusion and APT attacks are presented for future research.
    Few-Shot Forecasting of Time-Series with Heterogeneous Channels. (arXiv:2204.03456v1 [cs.LG])
    Learning complex time series forecasting models usually requires a large amount of data, as each model is trained from scratch for each task/data set. Leveraging learning experience with similar datasets is a well-established technique for classification problems called few-shot classification. However, existing approaches cannot be applied to time-series forecasting because i) multivariate time-series datasets have different channels and ii) forecasting is principally different from classification. In this paper we formalize the problem of few-shot forecasting of time-series with heterogeneous channels for the first time. Extending recent work on heterogeneous attributes in vector data, we develop a model composed of permutation-invariant deep set-blocks which incorporate a temporal embedding. We assemble the first meta-dataset of 40 multivariate time-series datasets and show through experiments that our model provides a good generalization, outperforming baselines carried over from simpler scenarios that either fail to learn across tasks or miss temporal information.
    Domain Adaptation for Time-Series Classification to Mitigate Covariate Shift. (arXiv:2204.03342v1 [cs.LG])
    The performance of a machine learning model degrades when it is applied to data from a similar but different domain than the data it has initially been trained on. To mitigate this domain shift problem, domain adaptation (DA) techniques search for an optimal transformation that converts the (current) input data from a source domain to a target domain to learn a domain-invariant representations that reduces domain discrepancy. This paper proposes a novel supervised domain adaptation based on two steps. First, we search for an optimal class-dependent transformation from the source to the target domain from a few samples. We consider optimal transport methods such as the earth mover distance with Laplacian regularization, Sinkhorn transport and correlation alignment. Second, we use embedding similarity techniques to select the corresponding transformation at inference. We use correlation metrics and maximum mean discrepancy with higher-order moment matching techniques. We conduct an extensive evaluation on time-series datasets with domain shift including simulated and various online handwriting datasets to demonstrate the performance.
    Energy-Efficient Adaptive Machine Learning on IoT End-Nodes With Class-Dependent Confidence. (arXiv:2204.03431v1 [cs.LG])
    Energy-efficient machine learning models that can run directly on edge devices are of great interest in IoT applications, as they can reduce network pressure and response latency, and improve privacy. An effective way to obtain energy-efficiency with small accuracy drops is to sequentially execute a set of increasingly complex models, early-stopping the procedure for "easy" inputs that can be confidently classified by the smallest models. As a stopping criterion, current methods employ a single threshold on the output probabilities produced by each model. In this work, we show that such a criterion is sub-optimal for datasets that include classes of different complexity, and we demonstrate a more general approach based on per-classes thresholds. With experiments on a low-power end-node, we show that our method can significantly reduce the energy consumption compared to the single-threshold approach.
    PALBERT: Teaching ALBERT to Ponder. (arXiv:2204.03276v1 [cs.LG])
    Currently, pre-trained models can be considered the default choice for a wide range of NLP tasks. Despite their SoTA results, there is practical evidence that these models may require a different number of computing layers for different input sequences, since evaluating all layers leads to overconfidence on wrong predictions (namely overthinking). This problem can potentially be solved by implementing adaptive computation time approaches, which were first designed to improve inference speed. Recently proposed PonderNet may be a promising solution for performing an early exit by treating the exit layers index as a latent variable. However, the originally proposed exit criterion, relying on sampling from trained posterior distribution on the probability of exiting from i-th layer, introduces major variance in model outputs, significantly reducing the resulting models performance. In this paper, we propose Ponder ALBERT (PALBERT): an improvement to PonderNet with a novel deterministic Q-exit criterion and a revisited model architecture. We compared PALBERT with recent methods for performing an early exit. We observed that the proposed changes can be considered significant improvements on the original PonderNet architecture and outperform PABEE on a wide range of GLUE tasks. In addition, we also performed an in-depth ablation study of the proposed architecture to further understand Lambda layers and their performance.
    Continual Inference: A Library for Efficient Online Inference with Deep Neural Networks in PyTorch. (arXiv:2204.03418v1 [cs.LG])
    We present Continual Inference, a Python library for implementing Continual Inference Networks (CINs) in PyTorch, a class of Neural Networks designed specifically for efficient inference in both online and batch processing scenarios. We offer a comprehensive introduction and guide to CINs and their implementation in practice, and provide best-practices and code examples for composing complex modules for modern Deep Learning. Continual Inference is readily downloadable via the Python Package Index and at \url{www.github.com/lukashedegaard/continual-inference}.
    Using Decision Tree as Local Interpretable Model in Autoencoder-based LIME. (arXiv:2204.03321v1 [cs.LG])
    Nowadays, deep neural networks are being used in many domains because of their high accuracy results. However, they are considered as "black box", means that they are not explainable for humans. On the other hand, in some tasks such as medical, economic, and self-driving cars, users want the model to be interpretable to decide if they can trust these results or not. In this work, we present a modified version of an autoencoder-based approach for local interpretability called ALIME. The ALIME itself is inspired by a famous method called Local Interpretable Model-agnostic Explanations (LIME). LIME generates a single instance level explanation by generating new data around the instance and training a local linear interpretable model. ALIME uses an autoencoder to weigh the new data around the sample. Nevertheless, the ALIME uses a linear model as the interpretable model to be trained locally, just like the LIME. This work proposes a new approach, which uses a decision tree instead of the linear model, as the interpretable model. We evaluate the proposed model in case of stability, local fidelity, and interpretability on different datasets. Compared to ALIME, the experiments show significant results on stability and local fidelity and improved results on interpretability.
    Fusing finetuned models for better pretraining. (arXiv:2204.03044v1 [cs.CL])
    Pretrained models are the standard starting point for training. This approach consistently outperforms the use of a random initialization. However, pretraining is a costly endeavour that few can undertake. In this paper, we create better base models at hardly any cost, by fusing multiple existing fine tuned models into one. Specifically, we fuse by averaging the weights of these models. We show that the fused model results surpass the pretrained model ones. We also show that fusing is often better than intertraining. We find that fusing is less dependent on the target task. Furthermore, weight decay nullifies intertraining effects but not those of fusing.
    Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators. (arXiv:2204.03243v1 [cs.CL])
    We present a new framework AMOS that pretrains text encoders with an Adversarial learning curriculum via a Mixture Of Signals from multiple auxiliary generators. Following ELECTRA-style pretraining, the main encoder is trained as a discriminator to detect replaced tokens generated by auxiliary masked language models (MLMs). Different from ELECTRA which trains one MLM as the generator, we jointly train multiple MLMs of different sizes to provide training signals at various levels of difficulty. To push the discriminator to learn better with challenging replaced tokens, we learn mixture weights over the auxiliary MLMs' outputs to maximize the discriminator loss by backpropagating the gradient from the discriminator via Gumbel-Softmax. For better pretraining efficiency, we propose a way to assemble multiple MLMs into one unified auxiliary model. AMOS outperforms ELECTRA and recent state-of-the-art pretrained models by about 1 point on the GLUE benchmark for BERT base-sized models.
    A Joint Learning Approach for Semi-supervised Neural Topic Modeling. (arXiv:2204.03208v1 [cs.IR])
    Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.
    Distributed Statistical Min-Max Learning in the Presence of Byzantine Agents. (arXiv:2204.03187v1 [cs.LG])
    Recent years have witnessed a growing interest in the topic of min-max optimization, owing to its relevance in the context of generative adversarial networks (GANs), robust control and optimization, and reinforcement learning. Motivated by this line of work, we consider a multi-agent min-max learning problem, and focus on the emerging challenge of contending with worst-case Byzantine adversarial agents in such a setup. By drawing on recent results from robust statistics, we design a robust distributed variant of the extra-gradient algorithm - a popular algorithmic approach for min-max optimization. Our main contribution is to provide a crisp analysis of the proposed robust extra-gradient algorithm for smooth convex-concave and smooth strongly convex-strongly concave functions. Specifically, we establish statistical rates of convergence to approximate saddle points. Our rates are near-optimal, and reveal both the effect of adversarial corruption and the benefit of collaboration among the non-faulty agents. Notably, this is the first paper to provide formal theoretical guarantees for large-scale distributed min-max learning in the presence of adversarial agents.
    Transformer-Based Language Models for Software Vulnerability Detection: Performance, Model's Security and Platforms. (arXiv:2204.03214v1 [cs.CR])
    The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the closeness of natural languages to the high-level programming language such as C/C++, this work studies how good are the large transformer-based language models detecting software vulnerabilities. Our results demonstrate the well performance of these models on software vulnerability detection. The answer enables extending transformer-based language models to vulnerability detection and leveraging superior performance beyond the natural language processing domain. Besides, we perform the model's security check using Microsoft's Counterfit, a command-line tool to assess the model's security. Our results find that these models are vulnerable to adversarial examples. In this regard, we present a simple countermeasure and its result. Experimenting with large models is always a challenge due to the requirement of computing resources and platforms/libraries & dependencies. Based on the experiences and difficulties we faced during this work, we present our recommendation while choosing the platforms to run these large models. Moreover, the popular platforms are surveyed thoroughly in this paper.
    Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes. (arXiv:2204.03376v1 [cs.LG])
    Hybrid closed loop systems represent the future of care for people with type 1 diabetes (T1D). These devices usually utilise simple control algorithms to select the optimal insulin dose for maintaining blood glucose levels within a healthy range. Online reinforcement learning (RL) has been utilised as a method for further enhancing glucose control in these devices. Previous approaches have been shown to reduce patient risk and improve time spent in the target range when compared to classical control algorithms, but are prone to instability in the learning process, often resulting in the selection of unsafe actions. This work presents an evaluation of offline RL as a means for developing clinically effective dosing policies without the need for patient interaction. This paper examines the utility of BCQ, CQL and TD3-BC in managing the blood glucose of nine virtual patients within the UVA/Padova glucose dynamics simulator. When trained on less than a tenth of the data required by online RL approaches, this work shows that offline RL can significantly increase time in the healthy blood glucose range when compared to the strongest state-of-art baseline. This is achieved without any associated increase in low blood glucose events. Offline RL is also shown to be able to correct for common and challenging scenarios such as incorrect bolus dosing, irregular meal timings and sub-optimal training data.
    Graph Neural Networks Designed for Different Graph Types: A Survey. (arXiv:2204.03080v1 [cs.LG])
    Graphs are ubiquitous in nature and can therefore serve as models for many practical but also theoretical problems. Based on this, the young research field of Graph Neural Networks (GNNs) has emerged. Despite the youth of the field and the speed in which new models are developed, many good surveys have been published in the last years. Nevertheless, an overview on which graph types can be modeled by GNNs is missing. In this survey, we give a detailed overview of already existing GNNs and, unlike previous surveys, categorize them according to their ability to handle different graph types. We consider GNNs operating on static as well as on dynamic graphs of different structural constitutions, with or without node or edge attributes. Moreover in the dynamic case, we separate the models in discrete-time and continuous-time dynamic graphs based on their architecture. According to our findings, there are still graph types, that are not covered by existing GNN models. Specifically, models concerning heterogeneity in attributes are missing and the deletion of nodes and edges is only covered rarely.
    Jacobian Norm for Unsupervised Source-Free Domain Adaptation. (arXiv:2204.03467v1 [cs.LG])
    Unsupervised Source (data) Free domain adaptation (USFDA) aims to transfer knowledge from a well-trained source model to a related but unlabeled target domain. In such a scenario, all conventional adaptation methods that require source data fail. To combat this challenge, existing USFDAs turn to transfer knowledge by aligning the target feature to the latent distribution hidden in the source model. However, such information is naturally limited. Thus, the alignment in such a scenario is not only difficult but also insufficient, which degrades the target generalization performance. To relieve this dilemma in current USFDAs, we are motivated to explore a new perspective to boost their performance. For this purpose and gaining necessary insight, we look back upon the origin of the domain adaptation and first theoretically derive a new-brand target generalization error bound based on the model smoothness. Then, following the theoretical insight, a general and model-smoothness-guided Jacobian norm (JN) regularizer is designed and imposed on the target domain to mitigate this dilemma. Extensive experiments are conducted to validate its effectiveness. In its implementation, just with a few lines of codes added to the existing USFDAs, we achieve superior results on various benchmark datasets.
    Multi-Task Distributed Learning using Vision Transformer with Random Patch Permutation. (arXiv:2204.03500v1 [cs.LG])
    The widespread application of artificial intelligence in health research is currently hampered by limitations in data availability. Distributed learning methods such as federated learning (FL) and shared learning (SL) are introduced to solve this problem as well as data management and ownership issues with their different strengths and weaknesses. The recent proposal of federated split task-agnostic (FeSTA) learning tries to reconcile the distinct merits of FL and SL by enabling the multi-task collaboration between participants through Vision Transformer (ViT) architecture, but they suffer from higher communication overhead. To address this, here we present a multi-task distributed learning using ViT with random patch permutation. Instead of using a CNN based head as in FeSTA, p-FeSTA adopts a randomly permuting simple patch embedder, improving the multi-task learning performance without sacrificing privacy. Experimental results confirm that the proposed method significantly enhances the benefit of multi-task collaboration, communication efficiency, and privacy preservation, shedding light on practical multi-task distributed learning in the field of medical imaging.
    Self-Supervised Learning to Prove Equivalence Between Programs via Semantics-Preserving Rewrite Rules. (arXiv:2109.10476v2 [cs.LG] UPDATED)
    We target the problem of automatically synthesizing proofs of semantic equivalence between two programs made of sequences of statements. We represent programs using abstract syntax trees (AST), where a given set of semantics-preserving rewrite rules can be applied on a specific AST pattern to generate a transformed and semantically equivalent program. In our system, two programs are equivalent if there exists a sequence of application of these rewrite rules that leads to rewriting one program into the other. We propose a neural network architecture based on a transformer model to generate proofs of equivalence between program pairs. The system outputs a sequence of rewrites, and the validity of the sequence is simply checked by verifying it can be applied. If no valid sequence is produced by the neural network, the system reports the programs as non-equivalent, ensuring by design no programs may be incorrectly reported as equivalent. Our system is fully implemented for a given grammar. To efficiently train the system to generate such sequences, we develop an original incremental training technique, named self-supervised sample selection. We extensively study the effectiveness of this novel training approach on proofs of increasing complexity and length. Our system, S4Eq, achieves 97% proof success on a curated dataset of 10,000 pairs of equivalent programs.
    RF Signal Transformation and Classification using Deep Neural Networks. (arXiv:2204.03564v1 [eess.SP])
    Deep neural networks (DNNs) designed for computer vision and natural language processing tasks cannot be directly applied to the radio frequency (RF) datasets. To address this challenge, we propose to convert the raw RF data to data types that are suitable for off-the-shelf DNNs by introducing a convolutional transform technique. In addition, we propose a simple 5-layer convolutional neural network architecture (CONV-5) that can operate with raw RF I/Q data without any transformation. Further, we put forward an RF dataset, referred to as RF1024, to facilitate future RF research. RF1024 consists of 8 different RF modulation classes with each class having 1000/200 training/test samples. Each sample of the RF1024 dataset contains 1024 complex I/Q values. Lastly, the experiments are performed on the RadioML2016 and RF1024 datasets to demonstrate the improved classification performance.
    AUV-Net: Learning Aligned UV Maps for Texture Transfer and Synthesis. (arXiv:2204.03105v1 [cs.CV])
    In this paper, we address the problem of texture representation for 3D shapes for the challenging and underexplored tasks of texture transfer and synthesis. Previous works either apply spherical texture maps which may lead to large distortions, or use continuous texture fields that yield smooth outputs lacking details. We argue that the traditional way of representing textures with images and linking them to a 3D mesh via UV mapping is more desirable, since synthesizing 2D images is a well-studied problem. We propose AUV-Net which learns to embed 3D surfaces into a 2D aligned UV space, by mapping the corresponding semantic parts of different 3D shapes to the same location in the UV space. As a result, textures are aligned across objects, and can thus be easily synthesized by generative models of images. Texture alignment is learned in an unsupervised manner by a simple yet effective texture alignment module, taking inspiration from traditional works on linear subspace learning. The learned UV mapping and aligned texture representations enable a variety of applications including texture transfer, texture synthesis, and textured single view 3D reconstruction. We conduct experiments on multiple datasets to demonstrate the effectiveness of our method. Project page: https://nv-tlabs.github.io/AUV-NET.
    FedADMM: A Robust Federated Deep Learning Framework with Adaptivity to System Heterogeneity. (arXiv:2204.03529v1 [cs.LG])
    Federated Learning (FL) is an emerging framework for distributed processing of large data volumes by edge devices subject to limited communication bandwidths, heterogeneity in data distributions and computational resources, as well as privacy considerations. In this paper, we introduce a new FL protocol termed FedADMM based on primal-dual optimization. The proposed method leverages dual variables to tackle statistical heterogeneity, and accommodates system heterogeneity by tolerating variable amount of work performed by clients. FedADMM maintains identical communication costs per round as FedAvg/Prox, and generalizes them via the augmented Lagrangian. A convergence proof is established for nonconvex objectives, under no restrictions in terms of data dissimilarity or number of participants per round of the algorithm. We demonstrate the merits through extensive experiments on real datasets, under both IID and non-IID data distributions across clients. FedADMM consistently outperforms all baseline methods in terms of communication efficiency, with the number of rounds needed to reach a prescribed accuracy reduced by up to 87%. The algorithm effectively adapts to heterogeneous data distributions through the use of dual variables, without the need for hyperparameter tuning, and its advantages are more pronounced in large-scale systems.
    MTI-Net: A Multi-Target Speech Intelligibility Prediction Model. (arXiv:2204.03310v1 [eess.AS])
    Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.
    AI-aided Traffic Control Scheme for M2M Communications in the Internet of Vehicles. (arXiv:2204.03504v1 [cs.NI])
    Due to the rapid growth of data transmissions in internet of vehicles (IoV), finding schemes that can effectively alleviate access congestion has become an important issue. Recently, many traffic control schemes have been studied. Nevertheless, the dynamics of traffic and the heterogeneous requirements of different IoV applications are not considered in most existing studies, which is significant for the random access resource allocation. In this paper, we consider a hybrid traffic control scheme and use proximal policy optimization (PPO) method to tackle it. Firstly, IoV devices are divided into various classes based on delay characteristics. The target of maximizing the successful transmission of packets with the success rate constraint is established. Then, the optimization objective is transformed into a markov decision process (MDP) model. Finally, the access class barring (ACB) factors are obtained based on the PPO method to maximize the number of successful access devices. The performance of the proposal algorithm in respect of successful events and delay compared to existing schemes is verified by simulations.
    Enabling Deep Learning for All-in EDGE paradigm. (arXiv:2204.03326v1 [cs.LG])
    Deep Learning-based models have been widely investigated, and they have demonstrated significant performance on non-trivial tasks such as speech recognition, image processing, and natural language understanding. However, this is at the cost of substantial data requirements. Considering the widespread proliferation of edge devices (e.g. Internet of Things devices) over the last decade, Deep Learning in the edge paradigm, such as device-cloud integrated platforms, is required to leverage its superior performance. Moreover, it is suitable from the data requirements perspective in the edge paradigm because the proliferation of edge devices has resulted in an explosion in the volume of generated and collected data. However, there are difficulties due to other requirements such as high computation, high latency, and high bandwidth caused by Deep Learning applications in real-world scenarios. In this regard, this survey paper investigates Deep Learning at the edge, its architecture, enabling technologies, and model adaption techniques, where edge servers and edge devices participate in deep learning training and inference. For simplicity, we call this paradigm the All-in EDGE paradigm. Besides, this paper presents the key performance metrics for Deep Learning at the All-in EDGE paradigm to evaluate various deep learning techniques and choose a suitable design. Moreover, various open challenges arising from the deployment of Deep Learning at the All-in EDGE paradigm are identified and discussed.
    Standardized feature extraction from pairwise conflicts applied to the train rescheduling problem. (arXiv:2204.03061v1 [cs.LG])
    We propose a train rescheduling algorithm which applies a standardized feature selection based on pairwise conflicts in order to serve as input for the reinforcement learning framework. We implement an analytical method which identifies and optimally solves every conflict arising between two trains, then we design a corresponding observation space which features the most relevant information considering these conflicts. The data obtained this way then translates to actions in the context of the reinforcement learning framework. We test our preliminary model using the evaluation metrics of the Flatland Challenge. The empirical results indicate that the suggested feature space provides meaningful observations, from which a sensible scheduling policy can be learned.
    Optimization Models and Interpretations for Three Types of Adversarial Perturbations against Support Vector Machines. (arXiv:2204.03154v1 [cs.LG])
    Adversarial perturbations have drawn great attentions in various deep neural networks. Most of them are computed by iterations and cannot be interpreted very well. In contrast, little attentions are paid to basic machine learning models such as support vector machines. In this paper, we investigate the optimization models and the interpretations for three types of adversarial perturbations against support vector machines, including sample-adversarial perturbations (sAP), class-universal adversarial perturbations (cuAP) as well as universal adversarial perturbations (uAP). For linear binary/multi classification support vector machines (SVMs), we derive the explicit solutions for sAP, cuAP and uAP (binary case), and approximate solution for uAP of multi-classification. We also obtain the upper bound of fooling rate for uAP. Such results not only increase the interpretability of the three adversarial perturbations, but also provide great convenience in computation since iterative process can be avoided. Numerical results show that our method is fast and effective in calculating three types of adversarial perturbations.
    Temporal Alignment for History Representation in Reinforcement Learning. (arXiv:2204.03525v1 [cs.LG])
    Environments in Reinforcement Learning are usually only partially observable. To address this problem, a possible solution is to provide the agent with information about the past. However, providing complete observations of numerous steps can be excessive. Inspired by human memory, we propose to represent history with only important changes in the environment and, in our approach, to obtain automatically this representation using self-supervision. Our method (TempAl) aligns temporally-close frames, revealing a general, slowly varying state of the environment. This procedure is based on contrastive loss, which pulls embeddings of nearby observations to each other while pushing away other samples from the batch. It can be interpreted as a metric that captures the temporal relations of observations. We propose to combine both common instantaneous and our history representation and we evaluate TempAl on all available Atari games from the Arcade Learning Environment. TempAl surpasses the instantaneous-only baseline in 35 environments out of 49. The source code of the method and of all the experiments is available at https://github.com/htdt/tempal.
    Multi-task nonparallel support vector machine for classification. (arXiv:2204.02972v1 [cs.LG])
    Direct multi-task twin support vector machine (DMTSVM) explores the shared information between multiple correlated tasks, then it produces better generalization performance. However, it contains matrix inversion operation when solving the dual problems, so it costs much running time. Moreover, kernel trick cannot be directly utilized in the nonlinear case. To effectively avoid above problems, a novel multi-task nonparallel support vector machine (MTNPSVM) including linear and nonlinear cases is proposed in this paper. By introducing epsilon-insensitive loss instead of square loss in DMTSVM, MTNPSVM effectively avoids matrix inversion operation and takes full advantage of the kernel trick. Theoretical implication of the model is further discussed. To further improve the computational efficiency, the alternating direction method of multipliers (ADMM) is employed when solving the dual problem. The computational complexity and convergence of the algorithm are provided. In addition, the property and sensitivity of the parameter in model are further explored. The experimental results on fifteen benchmark datasets and twelve image datasets demonstrate the validity of MTNPSVM in comparison with the state-of-the-art algorithms. Finally, it is applied to real Chinese Wine dataset, and also verifies its effectiveness.
    Federated Learning for Distributed Spectrum Sensing in NextG Communication Networks. (arXiv:2204.03027v1 [cs.NI])
    NextG networks are intended to provide the flexibility of sharing the spectrum with incumbent users and support various spectrum monitoring tasks such as anomaly detection, fault diagnostics, user equipment identification, and authentication. A network of wireless sensors is needed to monitor the spectrum for signal transmissions of interest over a large deployment area. Each sensor receives signals under a specific channel condition depending on its location and trains an individual model of a deep neural network (DNN) accordingly to classify signals. To improve the accuracy, individual sensors may exchange sensing data or sensor results with each other or with a fusion center (such as in cooperative spectrum sensing). In this paper, distributed federated learning over a multi-hop wireless network is considered to collectively train a DNN for signal identification. In distributed federated learning, each sensor broadcasts its trained model to its neighbors, collects the DNN models from its neighbors, and aggregates them to initialize its own model for the next round of training. Without exchanging any spectrum data, this process is repeated over time such that a common DNN is built across the network while preserving the privacy associated with signals collected at different locations. Signal classification accuracy and convergence time are evaluated for different network topologies (including line, star, ring, grid, and random networks) and packet loss events. Then, the reduction of communication overhead and energy consumption is considered with random participation of sensors in model updates. The results show the feasibility of extending cooperative spectrum sensing over a general multi-hop wireless network through federated learning and indicate its robustness to wireless network effects, thereby sustaining high accuracy with low communication overhead and energy consumption.
    Faster algorithms for learning to link, align sequences, and price two-part tariffs. (arXiv:2204.03569v1 [cs.DS])
    Data-driven algorithm configuration is a promising, learning-based approach for beyond worst-case analysis of algorithms with tunable parameters. An important open problem is the design of efficient data-driven algorithms for algorithm families with more than one parameter. In this work we provide algorithms for efficient (output-polynomial) multidimensional parameter tuning, i.e. for families with a small constant number of parameters, for three very different combinatorial problems -- linkage-based clustering, dynamic programming for sequence alignment, and auction design for two-part tariff schemes. We extend the single-parameter clustering algorithm of Balcan et al. 2020 arXiv:1907.00533 to multiple parameters and to the sequence alignment problem by proposing an execution graph which compactly represents all the states the algorithm could attain for all possible parameter values. A key problem-specific challenge is to efficiently compute how the partition of the parameter space (into regions with unique algorithmic states) changes with a single algorithmic step. We give algorithms which improve on the runtime of previously best known results for linkage-based clustering, sequence alignment and two-part tariff pricing.
    The Effects of Regularization and Data Augmentation are Class Dependent. (arXiv:2204.03632v1 [cs.LG])
    Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
    Knowledge Infused Decoding. (arXiv:2204.03084v1 [cs.CL])
    Pre-trained language models (LMs) have been shown to memorize a substantial amount of knowledge from the pre-training corpora; however, they are still limited in recalling factually correct knowledge given a certain context. Hence, they tend to suffer from counterfactual or hallucinatory generation when used in knowledge-intensive natural language generation (NLG) tasks. Recent remedies to this problem focus on modifying either the pre-training or task fine-tuning objectives to incorporate knowledge, which normally require additional costly training or architecture modification of LMs for practical applications. We present Knowledge Infused Decoding (KID) -- a novel decoding algorithm for generative LMs, which dynamically infuses external knowledge into each step of the LM decoding. Specifically, we maintain a local knowledge memory based on the current context, interacting with a dynamically created external knowledge trie, and continuously update the local memory as a knowledge-aware constraint to guide decoding via reinforcement learning. On six diverse knowledge-intensive NLG tasks, task-agnostic LMs (e.g., GPT-2 and BART) armed with KID outperform many task-optimized state-of-the-art models, and show particularly strong performance in few-shot scenarios over seven related knowledge-infusion techniques. Human evaluation confirms KID's ability to generate more relevant and factual language for the input context when compared with multiple baselines. Finally, KID also alleviates exposure bias and provides stable generation quality when generating longer sequences. Code for KID is available at https://github.com/microsoft/KID.
    Enhancement on Model Interpretability and Sleep Stage Scoring Performance with A Novel Pipeline Based on Deep Neural Network. (arXiv:2204.03173v1 [cs.LG])
    Considering the natural frequency characteristics in sleep medicine, this paper first proposes a time-frequency framework for the representation learning of the electroencephalogram (EEG) following the definition of the American Academy of Sleep Medicine. To meet the temporal-random and transient nature of the defining characteristics of sleep stages, we further design a context-sensitive flexible pipeline that automatically adapts to the attributes of data itself. That is, the input EEG spectrogram is partitioned into a sequence of patches in the time and frequency axes, and then input to a delicate deep learning network for further representation learning to extract the stage-dependent features, which are used in the classification step finally. The proposed pipeline is validated against a large database, i.e., the Sleep Heart Health Study (SHHS), and the results demonstrate that the competitive performance for the wake, N2, and N3 stages outperforms the state-of-art works, with the F1 scores being 0.93, 0.88, and 0.87, respectively, and the proposed method has a high inter-rater reliability of 0.80 kappa. Importantly, we visualize the stage scoring process of the model decision with the Layer-wise Relevance Propagation (LRP) method, which shows that the proposed pipeline is more sensitive and perceivable in the decision-making process than the baseline pipelines. Therefore, the pipeline together with the LRP method can provide better model interpretability, which is important for clinical support.
    Video Diffusion Models. (arXiv:2204.03458v1 [cs.CV])
    Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial results. Our model is a natural extension of the standard image diffusion architecture, and it enables jointly training from image and video data, which we find to reduce the variance of minibatch gradients and speed up optimization. To generate long and higher resolution videos we introduce a new conditional sampling technique for spatial and temporal video extension that performs better than previously proposed methods. We present the first results on a large text-conditioned video generation task, as well as state-of-the-art results on an established unconditional video generation benchmark. Supplementary material is available at https://video-diffusion.github.io/
    Incremental Unsupervised Feature Selection for Dynamic Incomplete Multi-view Data. (arXiv:2204.02973v1 [cs.LG])
    Multi-view unsupervised feature selection has been proven to be efficient in reducing the dimensionality of multi-view unlabeled data with high dimensions. The previous methods assume all of the views are complete. However, in real applications, the multi-view data are often incomplete, i.e., some views of instances are missing, which will result in the failure of these methods. Besides, while the data arrive in form of streams, these existing methods will suffer the issues of high storage cost and expensive computation time. To address these issues, we propose an Incremental Incomplete Multi-view Unsupervised Feature Selection method (I$^2$MUFS) on incomplete multi-view streaming data. By jointly considering the consistent and complementary information across different views, I$^2$MUFS embeds the unsupervised feature selection into an extended weighted non-negative matrix factorization model, which can learn a consensus clustering indicator matrix and fuse different latent feature matrices with adaptive view weights. Furthermore, we introduce the incremental leaning mechanisms to develop an alternative iterative algorithm, where the feature selection matrix is incrementally updated, rather than recomputing on the entire updated data from scratch. A series of experiments are conducted to verify the effectiveness of the proposed method by comparing with several state-of-the-art methods. The experimental results demonstrate the effectiveness and efficiency of the proposed method in terms of the clustering metrics and the computational cost.
    Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation. (arXiv:2204.03293v1 [cs.SE])
    Code search aims to retrieve the most semantically relevant code snippet for a given natural language query. Recently, large-scale code pre-trained models such as CodeBERT and GraphCodeBERT learn generic representations of source code and have achieved substantial improvement on code search task. However, the high-quality sequence-level representations of code snippets have not been sufficiently explored. In this paper, we propose a new approach with multimodal contrastive learning and soft data augmentation for code search. Multimodal contrastive learning is used to pull together the representations of code-query pairs and push apart the unpaired code snippets and queries. Moreover, data augmentation is critical in contrastive learning for learning high-quality representations. However, only semantic-preserving augmentations for source code are considered in existing work. In this work, we propose to do soft data augmentation by dynamically masking and replacing some tokens in code sequences to generate code snippets that are similar but not necessarily semantic-preserving as positive samples for paired queries. We conduct extensive experiments to evaluate the effectiveness of our approach on a large-scale dataset with six programming languages. The experimental results show that our approach significantly outperforms the state-of-the-art methods. We also adapt our techniques to several pre-trained models such as RoBERTa and CodeBERT, and significantly boost their performance on the code search task.
    Self supervised learning for robust voice cloning. (arXiv:2204.03421v1 [cs.SD])
    Voice cloning is a difficult task which requires robust and informative features incorporated in a high quality TTS system in order to effectively copy an unseen speaker's voice. In our work, we utilize features learned in a self-supervised framework via the Bootstrap Your Own Latent (BYOL) method, which is shown to produce high quality speech representations when specific audio augmentations are applied to the vanilla algorithm. We further extend the augmentations in the training procedure to aid the resulting features to capture the speaker identity and to make them robust to noise and acoustic conditions. The learned features are used as pre-trained utterance-level embeddings and as inputs to a Non-Attentive Tacotron based architecture, aiming to achieve multispeaker speech synthesis without utilizing additional speaker features. This method enables us to train our model in an unlabeled multispeaker dataset as well as use unseen speaker embeddings to copy a speaker's voice. Subjective and objective evaluations are used to validate the proposed model, as well as the robustness to the acoustic conditions of the target utterance.
    Adaptive Spike-Like Representation of EEG Signals for Sleep Stages Scoring. (arXiv:2204.03565v1 [eess.SP])
    Recently there has seen promising results on automatic stage scoring by extracting spatio-temporal features from electroencephalogram (EEG). Such methods entail laborious manual feature engineering and domain knowledge. In this study, we propose an adaptive scheme to probabilistically encode, filter and accumulate the input signals and weight the resultant features by the half-Gaussian probabilities of signal intensities. The adaptive representations are subsequently fed into a transformer model to automatically mine the relevance between features and corresponding stages. Extensive experiments on the largest public dataset against state-of-the-art methods validate the effectiveness of our proposed method and reveal promising future directions.
    Deep transfer learning for system identification using long short-term memory neural networks. (arXiv:2204.03125v1 [eess.SY])
    Recurrent neural networks (RNNs) have many advantages over more traditional system identification techniques. They may be applied to linear and nonlinear systems, and they require fewer modeling assumptions. However, these neural network models may also need larger amounts of data to learn and generalize. Furthermore, neural networks training is a time-consuming process. Hence, building upon long-short term memory neural networks (LSTM), this paper proposes using two types of deep transfer learning, namely parameter fine-tuning and freezing, to reduce the data and computation requirements for system identification. We apply these techniques to identify two dynamical systems, namely a second-order linear system and a Wiener-Hammerstein nonlinear system. Results show that compared with direct learning, our method accelerates learning by 10% to 50%, which also saves data and computing resources.
  • Open

    Covariate-assisted Sparse Tensor Completion. (arXiv:2103.06428v3 [stat.ML] UPDATED)
    We aim to provably complete a sparse and highly-missing tensor in the presence of covariate information along tensor modes. Our motivation comes from online advertising where users click-through-rates (CTR) on ads over various devices form a CTR tensor that has about 96% missing entries and has many zeros on non-missing entries, which makes the standalone tensor completion method unsatisfactory. Beside the CTR tensor, additional ad features or user characteristics are often available. In this paper, we propose Covariate-assisted Sparse Tensor Completion (COSTCO) to incorporate covariate information for the recovery of the sparse tensor. The key idea is to jointly extract latent components from both the tensor and the covariate matrix to learn a synthetic representation. Theoretically, we derive the error bound for the recovered tensor components and explicitly quantify the improvements on both the reveal probability condition and the tensor recovery accuracy due to covariates. Finally, we apply COSTCO to an advertisement dataset consisting of a CTR tensor and ad covariate matrix, leading to 23% accuracy improvement over the baseline. An important by-product is that ad latent components from COSTCO reveal interesting ad clusters, which are useful for better ad targeting.  ( 2 min )
    Covariance matrix preparation for quantum principal component analysis. (arXiv:2204.03495v1 [quant-ph])
    Principal component analysis (PCA) is a dimensionality reduction method in data analysis that involves diagonalizing the covariance matrix of the dataset. Recently, quantum algorithms have been formulated for PCA based on diagonalizing a density matrix. These algorithms assume that the covariance matrix can be encoded in a density matrix, but a concrete protocol for this encoding has been lacking. Our work aims to address this gap. Assuming amplitude encoding of the data, with the data given by the ensemble $\{p_i,| \psi_i \rangle\}$, then one can easily prepare the ensemble average density matrix $\overline{\rho} = \sum_i p_i |\psi_i\rangle \langle \psi_i |$. We first show that $\overline{\rho}$ is precisely the covariance matrix whenever the dataset is centered. For quantum datasets, we exploit global phase symmetry to argue that there always exists a centered dataset consistent with $\overline{\rho}$, and hence $\overline{\rho}$ can always be interpreted as a covariance matrix. This provides a simple means for preparing the covariance matrix for arbitrary quantum datasets or centered classical datasets. For uncentered classical datasets, our method is so-called "PCA without centering", which we interpret as PCA on a symmetrized dataset. We argue that this closely corresponds to standard PCA, and we derive equations and inequalities that bound the deviation of the spectrum obtained with our method from that of standard PCA. We numerically illustrate our method for the MNIST handwritten digit dataset. We also argue that PCA on quantum datasets is natural and meaningful, and we numerically implement our method for molecular ground-state datasets.  ( 2 min )
    Distributionally Robust Optimal Power Flow with Contextual Information. (arXiv:2109.07896v2 [math.OC] UPDATED)
    In this paper, we develop a distributionally robust chance-constrained formulation of the Optimal Power Flow problem (OPF) whereby the system operator can leverage contextual information. For this purpose, we exploit an ambiguity set based on probability trimmings and optimal transport through which the dispatch solution is protected against the incomplete knowledge of the relationship between the OPF uncertainties and the context that is conveyed by a sample of their joint probability distribution. We provide a tractable reformulation of the proposed distributionally robust chance-constrained OPF problem under the popular conditional-value-at-risk approximation. By way of numerical experiments run on a modified IEEE-118 bus network with wind uncertainty, we show how the power system can substantially benefit from taking into account the well-known statistical dependence between the point forecast of wind power outputs and its associated prediction error. Furthermore, the experiments conducted also reveal that the distributional robustness conferred on the OPF solution by our probability-trimmings-based approach is superior to that bestowed by alternative approaches in terms of expected cost and system reliability.  ( 2 min )
    A comparison of mixed-variables Bayesian optimization approaches. (arXiv:2111.01533v2 [math.OC] UPDATED)
    Most real optimization problems are defined over a mixed search space where the variables are both discrete and continuous. In engineering applications, the objective function is typically calculated with a numerically costly black-box simulation.General mixed and costly optimization problems are therefore of a great practical interest, yet their resolution remains in a large part an open scientific question. In this article, costly mixed problems are approached through Gaussian processes where the discrete variables are relaxed into continuous latent variables. The continuous space is more easily harvested by classical Bayesian optimization techniques than a mixed space would. Discrete variables are recovered either subsequently to the continuous optimization, or simultaneously with an additional continuous-discrete compatibility constraint that is handled with augmented Lagrangians. Several possible implementations of such Bayesian mixed optimizers are compared. In particular, the reformulation of the problem with continuous latent variables is put in competition with searches working directly in the mixed space. Among the algorithms involving latent variables and an augmented Lagrangian, a particular attention is devoted to the Lagrange multipliers for which a local and a global estimation techniques are studied. The comparisons are based on the repeated optimization of three analytical functions and a beam design problem.  ( 2 min )
    Mo\"ET: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning. (arXiv:1906.06717v4 [cs.LG] UPDATED)
    Rapid advancements in deep learning have led to many recent breakthroughs. While deep learning models achieve superior performance, often statistically better than humans, their adoption into safety-critical settings, such as healthcare or self-driving cars is hindered by their inability to provide safety guarantees or to expose the inner workings of the model in a human understandable form. We present Mo\"ET, a novel model based on Mixture of Experts, consisting of decision tree experts and a generalized linear model gating function. Thanks to such gating function the model is more expressive than the standard decision tree. To support non-differentiable decision trees as experts, we formulate a novel training procedure. In addition, we introduce a hard thresholding version, Mo\"ETH, in which predictions are made solely by a single expert chosen via the gating function. Thanks to that property, Mo\"ETH allows each prediction to be easily decomposed into a set of logical rules in a form which can be easily verified. While Mo\"ET is a general use model, we illustrate its power in the reinforcement learning setting. By training Mo\"ET models using an imitation learning procedure on deep RL agents we outperform the previous state-of-the-art technique based on decision trees while preserving the verifiability of the models. Moreover, we show that Mo\"ET can also be used in real-world supervised problems on which it outperforms other verifiable machine learning models.  ( 3 min )
    Amortized Auto-Tuning: Cost-Efficient Bayesian Transfer Optimization for Hyperparameter Recommendation. (arXiv:2106.09179v2 [cs.LG] UPDATED)
    With the surge in the number of hyperparameters and training times of modern machine learning models, hyperparameter tuning is becoming increasingly expensive. However, after assessing 40 tuning methods systematically, we find that each faces certain limitations. In particular, methods that speed up tuning via knowledge transfer typically require the final performance of hyperparameters and do not focus on low-fidelity information. As we demonstrate empirically, this common practice is suboptimal and can incur an unnecessary use of resources. It is more cost-efficient to instead leverage low-fidelity tuning observations to measure inter-task similarity and transfer knowledge from existing to new tasks accordingly. However, performing multi-fidelity tuning comes with its own challenges in the transfer setting: the noise in additional observations and the need for performance forecasting. Therefore, we propose and conduct a thorough analysis of a multi-task multi-fidelity Bayesian optimization framework, which leads to the best instantiation--amortized auto-tuning (AT2). We further present an offline-computed 27-task hyperparameter recommendation (HyperRec) database to serve the community. Extensive experiments on HyperRec and other real-world databases illustrate the effectiveness of our AT2 method.  ( 2 min )
    A Joint Learning Approach for Semi-supervised Neural Topic Modeling. (arXiv:2204.03208v1 [cs.IR])
    Topic models are some of the most popular ways to represent textual data in an interpret-able manner. Recently, advances in deep generative models, specifically auto-encoding variational Bayes (AEVB), have led to the introduction of unsupervised neural topic models, which leverage deep generative models as opposed to traditional statistics-based topic models. We extend upon these neural topic models by introducing the Label-Indexed Neural Topic Model (LI-NTM), which is, to the extent of our knowledge, the first effective upstream semi-supervised neural topic model. We find that LI-NTM outperforms existing neural topic models in document reconstruction benchmarks, with the most notable results in low labeled data regimes and for data-sets with informative labels; furthermore, our jointly learned classifier outperforms baseline classifiers in ablation studies.  ( 2 min )
    Differentially Private Set Union. (arXiv:2002.09745v2 [cs.CR] UPDATED)
    We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.  ( 3 min )
    Visualizing Deep Neural Networks with Topographic Activation Maps. (arXiv:2204.03528v1 [cs.LG])
    Machine Learning with Deep Neural Networks (DNNs) has become a successful tool in solving tasks across various fields of application. The success of DNNs is strongly connected to their high complexity in terms of the number of network layers or of neurons in each layer, which severely complicates to understand how DNNs solve their learned task. To improve the explainability of DNNs, we adapt methods from neuroscience because this field has a rich experience in analyzing complex and opaque systems. In this work, we draw inspiration from how neuroscience uses topographic maps to visualize the activity of the brain when it performs certain tasks. Transferring this approach to DNNs can help to visualize and understand their internal processes more intuitively, too. However, the inner structures of brains and DNNs differ substantially. Therefore, to be able to visualize activations of neurons in DNNs as topographic maps, we research techniques to layout the neurons in a two-dimensional space in which neurons of similar activity are in the vicinity of each other. In this work, we introduce and compare different methods to obtain a topographic layout of the neurons in a network layer. Moreover, we demonstrate how to use the resulting topographic activation maps to identify errors or encoded biases in DNNs or data sets. Our novel visualization technique improves the transparency of DNN-based algorithmic decision-making systems and is accessible to a broad audience because topographic maps are intuitive to interpret without expert-knowledge in Machine Learning.
    The Effects of Regularization and Data Augmentation are Class Dependent. (arXiv:2204.03632v1 [cs.LG])
    Regularization is a fundamental technique to prevent over-fitting and to improve generalization performances by constraining a model's complexity. Current Deep Networks heavily rely on regularizers such as Data-Augmentation (DA) or weight-decay, and employ structural risk minimization, i.e. cross-validation, to select the optimal regularization hyper-parameters. In this study, we demonstrate that techniques such as DA or weight decay produce a model with a reduced complexity that is unfair across classes. The optimal amount of DA or weight decay found from cross-validation leads to disastrous model performances on some classes e.g. on Imagenet with a resnet50, the "barn spider" classification test accuracy falls from $68\%$ to $46\%$ only by introducing random crop DA during training. Even more surprising, such performance drop also appears when introducing uninformative regularization techniques such as weight decay. Those results demonstrate that our search for ever increasing generalization performance -- averaged over all classes and samples -- has left us with models and regularizers that silently sacrifice performances on some classes. This scenario can become dangerous when deploying a model on downstream tasks e.g. an Imagenet pre-trained resnet50 deployed on INaturalist sees its performances fall from $70\%$ to $30\%$ on class \#8889 when introducing random crop DA during the Imagenet pre-training phase. Those results demonstrate that designing novel regularizers without class-dependent bias remains an open research question.
    Categorical Distributions of Maximum Entropy under Marginal Constraints. (arXiv:2204.03406v1 [hep-th])
    The estimation of categorical distributions under marginal constraints summarizing some sample from a population in the most-generalizable way is key for many machine-learning and data-driven approaches. We provide a parameter-agnostic theoretical framework that enables this task ensuring (i) that a categorical distribution of Maximum Entropy under marginal constraints always exists and (ii) that it is unique. The procedure of iterative proportional fitting (IPF) naturally estimates that distribution from any consistent set of marginal constraints directly in the space of probabilities, thus deductively identifying a least-biased characterization of the population. The theoretical framework together with IPF leads to a holistic workflow that enables modeling any class of categorical distributions solely using the phenomenological information provided.  ( 2 min )
    GFlowNet Foundations. (arXiv:2111.09266v2 [cs.LG] UPDATED)
    Generative Flow Networks (GFlowNets) have been introduced as a method to sample a diverse set of candidates in an active learning context, with a training objective that makes them approximately sample in proportion to a given reward function. In this paper, we show a number of additional theoretical properties of GFlowNets. They can be used to estimate joint probability distributions and the corresponding marginal distributions where some variables are unspecified and, of particular interest, can represent distributions over composite objects like sets and graphs. GFlowNets amortize the work typically done by computationally expensive MCMC methods in a single but trained generative pass. They could also be used to estimate partition functions and free energies, conditional probabilities of supersets (supergraphs) given a subset (subgraph), as well as marginal distributions over all supersets (supergraphs) of a given set (graph). We introduce variations enabling the estimation of entropy and mutual information, sampling from a Pareto frontier, connections to reward-maximizing policies, and extensions to stochastic environments, continuous actions and modular energy functions.  ( 2 min )
    Flexible Amortized Variational Inference in qBOLD MRI. (arXiv:2203.05845v2 [eess.IV] UPDATED)
    Streamlined qBOLD acquisitions enable experimentally straightforward observations of brain oxygen metabolism. $R_2^\prime$ maps are easily inferred; however, the Oxygen extraction fraction (OEF) and deoxygenated blood volume (DBV) are more ambiguously determined from the data. As such, existing inference methods tend to yield very noisy and underestimated OEF maps, while overestimating DBV. This work describes a novel probabilistic machine learning approach that can infer plausible distributions of OEF and DBV. Initially, we create a model that produces informative voxelwise prior distribution based on synthetic training data. Contrary to prior work, we model the joint distribution of OEF and DBV through a scaled multivariate logit-Normal distribution, which enables the values to be constrained within a plausible range. The prior distribution model is used to train an efficient amortized variational Bayesian inference model. This model learns to infer OEF and DBV by predicting real image data, with few training data required, using the signal equations as a forward model. We demonstrate that our approach enables the inference of smooth OEF and DBV maps, with a physiologically plausible distribution that can be adapted through specification of an informative prior distribution. Other benefits include model comparison (via the evidence lower bound) and uncertainty quantification for identifying image artefacts. Results are demonstrated on a small study comparing subjects undergoing hyperventilation and at rest. We illustrate that the proposed approach allows measurement of gray matter differences in OEF and DBV and enables voxelwise comparison between conditions, where we observe significant increases in OEF and $R_2^\prime$ during hyperventilation.  ( 2 min )
    Multiscale Clustering of Hyperspectral Images Through Spectral-Spatial Diffusion Geometry. (arXiv:2103.15783v2 [cs.LG] UPDATED)
    Clustering algorithms partition a dataset into groups of similar points. The primary contribution of this article is the Multiscale Spatially-Regularized Diffusion Learning (M-SRDL) clustering algorithm, which uses spatially-regularized diffusion distances to efficiently and accurately learn multiple scales of latent structure in hyperspectral images. The M-SRDL clustering algorithm extracts clusterings at many scales from a hyperspectral image and outputs these clusterings' variation of information-barycenter as an exemplar for all underlying cluster structure. We show that incorporating spatial regularization into a multiscale clustering framework results in smoother and more coherent clusters when applied to hyperspectral data, yielding more accurate clustering labels.  ( 2 min )
    Sliced gradient-enhanced Kriging for high-dimensional function approximation. (arXiv:2204.03562v1 [stat.ML])
    Gradient-enhanced Kriging (GE-Kriging) is a well-established surrogate modelling technique for approximating expensive computational models. However, it tends to get impractical for high-dimensional problems due to the large inherent correlation matrix and the associated high-dimensional hyper-parameter tuning problem. To address these issues, we propose a new method in this paper, called sliced GE-Kriging (SGE-Kriging) for reducing both the size of the correlation matrix and the number of hyper-parameters. Firstly, we perform a derivative-based global sensitivity analysis to detect the relative importance of each input variable with respect to model response. Then, we propose to split the training sample set into multiple slices, and invoke Bayes' theorem to approximate the full likelihood function via a sliced likelihood function, in which multiple small correlation matrices are utilized to describe the correlation of the sample set. Additionally, we replace the original high-dimensional hyper-parameter tuning problem with a low-dimensional counterpart by learning the relationship between the hyper-parameters and the global sensitivity indices. Finally, we validate SGE-Kriging by means of numerical experiments with several benchmarks problems. The results show that the SGE-Kriging model features an accuracy and robustness that is comparable to the standard one but comes at much less training costs. The benefits are most evident in high-dimensional problems.
    Data blurring: sample splitting a single sample. (arXiv:2112.11079v2 [stat.ME] UPDATED)
    Suppose we observe a random vector $X$ from some distribution $P$ in a known family with unknown parameters. We ask the following question: when is it possible to split $X$ into two parts $f(X)$ and $g(X)$ such that neither part is sufficient to reconstruct $X$ by itself, but both together can recover $X$ fully, and the joint distribution of $(f(X),g(X))$ is tractable? As one example, if $X=(X_1,\dots,X_n)$ and $P$ is a product distribution, then for any $m<n$, we can split the sample to define $f(X)=(X_1,\dots,X_m)$ and $g(X)=(X_{m+1},\dots,X_n)$. Rasines and Young (2021) offers an alternative route of accomplishing this task through randomization of $X$ with additive Gaussian noise which enables post-selection inference in finite samples for Gaussian distributed data and asymptotically for non-Gaussian additive models. In this paper, we offer a more general methodology for achieving such a split in finite samples by borrowing ideas from Bayesian inference to yield a (frequentist) solution that can be viewed as a continuous analog of data splitting. We call our method data blurring, as an alternative to data splitting, data carving and p-value masking. We exemplify the method on a few prototypical applications, such as post-selection inference for trend filtering and other regression problems.
    Online Bootstrap Inference For Policy Evaluation in Reinforcement Learning. (arXiv:2108.03706v2 [stat.ML] UPDATED)
    The recent emergence of reinforcement learning has created a demand for robust statistical inference methods for the parameter estimates computed using these algorithms. Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations, while existing statistical inference methods in reinforcement learning (RL) are limited to the batch setting. The online bootstrap is a flexible and efficient approach for statistical inference in linear stochastic approximation algorithms, but its efficacy in settings involving Markov noise, such as RL, has yet to be explored. In this paper, we study the use of the online bootstrap method for statistical inference in RL. In particular, we focus on the temporal difference (TD) learning and Gradient TD (GTD) learning algorithms, which are themselves special instances of linear stochastic approximation under Markov noise. The method is shown to be distributionally consistent for statistical inference in policy evaluation, and numerical experiments are included to demonstrate the effectiveness of this algorithm at statistical inference tasks across a range of real RL environments.  ( 2 min )
    Understanding Dynamics of Nonlinear Representation Learning and Its Application. (arXiv:2106.14836v3 [cs.LG] UPDATED)
    Representations of the world environment play a crucial role in artificial intelligence. It is often inefficient to conduct reasoning and inference directly in the space of raw sensory representations, such as pixel values of images. Representation learning allows us to automatically discover suitable representations from raw sensory data. For example, given raw sensory data, a deep neural network learns nonlinear representations at its hidden layers, which are subsequently used for classification at its output layer. This happens implicitly during training through minimizing a supervised or unsupervised loss. In this paper, we study the dynamics of such implicit nonlinear representation learning. We identify a pair of a new assumption and a novel condition, called the common model structure assumption and the data-architecture alignment condition. Under the common model structure assumption, the data-architecture alignment condition is shown to be sufficient for the global convergence and necessary for the global optimality. Moreover, our theory explains how and when increasing the network size does and does not improve the training behaviors in the practical regime. Our results provide practical guidance for designing a model structure: e.g., the common model structure assumption can be used as a justification for using a particular model structure instead of others. We also derive a new training framework, which satisfies the data-architecture alignment condition by automatically modifying any given training algorithm. Given a standard training algorithm, the framework running its modified version is empirically shown to maintain competitive test performances while providing global convergence guarantees for deep residual neural networks with convolutions, skip connections, and batch normalization with datasets, including MNIST, CIFAR-10, CIFAR-100, Semeion, KMNIST and SVHN.  ( 3 min )
    VNIbCReg: VICReg with Neighboring-Invariance and better-Covariance Evaluated on Non-stationary Seismic Signal Time Series. (arXiv:2204.02697v2 [cs.LG] UPDATED)
    One of the latest self-supervised learning (SSL) methods, VICReg, showed a great performance both in the linear evaluation and the fine-tuning evaluation. However, VICReg is proposed in computer vision and it learns by pulling representations of random crops of an image while maintaining the representation space by the variance and covariance loss. However, VICReg would be ineffective on non-stationary time series where different parts/crops of input should be differently encoded to consider the non-stationarity. Another recent SSL proposal, Temporal Neighborhood Coding (TNC) is effective for encoding non-stationary time series. This study shows that a combination of a VICReg-style method and TNC is very effective for SSL on non-stationary time series, where a non-stationary seismic signal time series is used as an evaluation dataset.  ( 2 min )
    Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. (arXiv:2002.02601v2 [stat.ML] UPDATED)
    Several modern applications require the integration of multiple large data matrices that have shared rows and/or columns. For example, cancer studies that integrate multiple omics platforms across multiple types of cancer, pan-omics pan-cancer analysis, have extended our knowledge of molecular heterogenity beyond what was observed in single tumor and single platform studies. However, these studies have been limited by available statistical methodology. We propose a flexible approach to the simultaneous factorization and decomposition of variation across such bidimensionally linked matrices, BIDIFAC+. This decomposes variation into a series of low-rank components that may be shared across any number of row sets (e.g., omics platforms) or column sets (e.g., cancer types). This builds on a growing literature for the factorization and decomposition of linked matrices, which has primarily focused on multiple matrices that are linked in one dimension (rows or columns) only. Our objective function extends nuclear norm penalization, is motivated by random matrix theory, gives an identifiable decomposition under relatively mild conditions, and can be shown to give the mode of a Bayesian posterior distribution. We apply BIDIFAC+ to pan-omics pan-cancer data from TCGA, identifying shared and specific modes of variability across 4 different omics platforms and 29 different cancer types.  ( 2 min )
    Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning. (arXiv:2004.10888v6 [cs.LG] UPDATED)
    We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable. MVPI enjoys great flexibility in that any policy evaluation method and risk-neutral control method can be dropped in for risk-averse control off the shelf, in both on- and off-policy settings. This flexibility reduces the gap between risk-neutral control and risk-averse control and is achieved by working on a novel augmented MDP directly. We propose risk-averse TD3 as an example instantiating MVPI, which outperforms vanilla TD3 and many previous risk-averse control methods in challenging Mujoco robot simulation tasks under a risk-aware performance metric. This risk-averse TD3 is the first to introduce deterministic policies and off-policy learning into risk-averse reinforcement learning, both of which are key to the performance boost we show in Mujoco domains.  ( 2 min )
    DeepTensor: Low-Rank Tensor Decomposition with Deep Network Priors. (arXiv:2204.03145v1 [stat.AP])
    DeepTensor is a computationally efficient framework for low-rank decomposition of matrices and tensors using deep generative networks. We decompose a tensor as the product of low-rank tensor factors (e.g., a matrix as the outer product of two vectors), where each low-rank tensor is generated by a deep network (DN) that is trained in a self-supervised manner to minimize the mean-squared approximation error. Our key observation is that the implicit regularization inherent in DNs enables them to capture nonlinear signal structures (e.g., manifolds) that are out of the reach of classical linear methods like the singular value decomposition (SVD) and principal component analysis (PCA). Furthermore, in contrast to the SVD and PCA, whose performance deteriorates when the tensor's entries deviate from additive white Gaussian noise, we demonstrate that the performance of DeepTensor is robust to a wide range of distributions. We validate that DeepTensor is a robust and computationally efficient drop-in replacement for the SVD, PCA, nonnegative matrix factorization (NMF), and similar decompositions by exploring a range of real-world applications, including hyperspectral image denoising, 3D MRI tomography, and image classification. In particular, DeepTensor offers a 6dB signal-to-noise ratio improvement over standard denoising methods for signals corrupted by Poisson noise and learns to decompose 3D tensors 60 times faster than a single DN equipped with 3D convolutions.  ( 2 min )
    MultiAuto-DeepONet: A Multi-resolution Autoencoder DeepONet for Nonlinear Dimension Reduction, Uncertainty Quantification and Operator Learning of Forward and Inverse Stochastic Problems. (arXiv:2204.03193v1 [stat.ML])
    A new data-driven method for operator learning of stochastic differential equations(SDE) is proposed in this paper. The central goal is to solve forward and inverse stochastic problems more effectively using limited data. Deep operator network(DeepONet) has been proposed recently for operator learning. Compared to other neural networks to learn functions, it aims at the problem of learning nonlinear operators. However, it can be challenging by using the original model to learn nonlinear operators for high-dimensional stochastic problems. We propose a new multi-resolution autoencoder DeepONet model referred to as MultiAuto-DeepONet to deal with this difficulty with the aid of convolutional autoencoder. The encoder part of the network is designed to reduce the dimensionality as well as discover the hidden features of high-dimensional stochastic inputs. The decoder is designed to have a special structure, i.e. in the form of DeepONet. The first DeepONet in decoder is designed to reconstruct the input function involving randomness while the second one is used to approximate the solution of desired equations. Those two DeepONets has a common branch net and two independent trunk nets. This architecture enables us to deal with multi-resolution inputs naturally. By adding $L_1$ regularization to our network, we found the outputs from the branch net and two trunk nets all have sparse structures. This reduces the number of trainable parameters in the neural network thus making the model more efficient. Finally, we conduct several numerical experiments to illustrate the effectiveness of our proposed MultiAuto-DeepONet model with uncertainty quantification.  ( 2 min )
    A novel nonconvex, smooth-at-origin penalty for statistical learning. (arXiv:2204.03123v1 [stat.ML])
    Nonconvex penalties are utilized for regularization in high-dimensional statistical learning algorithms primarily because they yield unbiased or nearly unbiased estimators for the parameters in the model. Nonconvex penalties existing in the literature such as SCAD, MCP, Laplace and arctan have a singularity at origin which makes them useful also for variable selection. However, in several high-dimensional frameworks such as deep learning, variable selection is less of a concern. In this paper, we present a nonconvex penalty which is smooth at origin. The paper includes asymptotic results for ordinary least squares estimators regularized with the new penalty function, showing asymptotic bias that vanishes exponentially fast. We also conducted an empirical study employing deep neural network architecture on three datasets and convolutional neural network on four datasets. The empirical study showed better performance for the new regularization approach in five out of the seven datasets.  ( 2 min )
    Statistical Model Criticism of Variational Auto-Encoders. (arXiv:2204.03030v1 [cs.LG])
    We propose a framework for the statistical evaluation of variational auto-encoders (VAEs) and test two instances of this framework in the context of modelling images of handwritten digits and a corpus of English text. Our take on evaluation is based on the idea of statistical model criticism, popular in Bayesian data analysis, whereby a statistical model is evaluated in terms of its ability to reproduce statistics of an unknown data generating process from which we can obtain samples. A VAE learns not one, but two joint distributions over a shared sample space, each exploiting a choice of factorisation that makes sampling tractable in one of two directions (latent-to-data, data-to-latent). We evaluate samples from these distributions, assessing their (marginal) fit to the observed data and our choice of prior, and we also evaluate samples through a pipeline that connects the two distributions starting from a data sample, assessing whether together they exploit and reveal latent factors of variation that are useful to a practitioner. We show that this methodology offers possibilities for model selection qualitatively beyond intrinsic evaluation metrics and at a finer granularity than commonly used statistics can offer.  ( 2 min )
    What You See is What You Get: Distributional Generalization for Algorithm Design in Deep Learning. (arXiv:2204.03230v1 [cs.LG])
    We investigate and leverage a connection between Differential Privacy (DP) and the recently proposed notion of Distributional Generalization (DG). Applying this connection, we introduce new conceptual tools for designing deep-learning methods that bypass "pathologies" of standard stochastic gradient descent (SGD). First, we prove that differentially private methods satisfy a "What You See Is What You Get (WYSIWYG)" generalization guarantee: whatever a model does on its train data is almost exactly what it will do at test time. This guarantee is formally captured by distributional generalization. WYSIWYG enables principled algorithm design in deep learning by reducing $\textit{generalization}$ concerns to $\textit{optimization}$ ones: in order to mitigate unwanted behavior at test time, it is provably sufficient to mitigate this behavior on the train data. This is notably false for standard (non-DP) methods, hence this observation has applications even when privacy is not required. For example, importance sampling is known to fail for standard SGD, but we show that it has exactly the intended effect for DP-trained models. Thus, with DP-SGD, unlike with SGD, we can influence test-time behavior by making principled train-time interventions. We use these insights to construct simple algorithms which match or outperform SOTA in several distributional robustness applications, and to significantly improve the privacy vs. disparate impact trade-off of DP-SGD. Finally, we also improve on known theoretical bounds relating differential privacy, stability, and distributional generalization.  ( 2 min )
    Composite Spatial Monte Carlo Integration Based on Generalized Least Squares. (arXiv:2204.03248v1 [stat.CO])
    Although evaluation of the expectations on the Ising model is essential in various applications, this is frequently infeasible because of intractable multiple summations (or integrations). Spatial Monte Carlo integration (SMCI) is a sampling-based approximation, and can provide high-accuracy estimations for such intractable expectations. To evaluate the expectation of a function of variables in a specific region (called target region), SMCI considers a larger region containing the target region (called sum region). In SMCI, the multiple summation for the variables in the sum region is precisely executed, and that in the outer region is evaluated by the sampling approximation such as the standard Monte Carlo integration. It is guaranteed that the accuracy of the SMCI estimator is monotonically improved as the size of the sum region increases. However, a haphazard expansion of the sum region could cause a combinatorial explosion. Therefore, we hope to improve the accuracy without such region expansion. In this study, based on the theory of generalized least squares, a new effective method is proposed by combining multiple SMCI estimators. The validity of the proposed method is demonstrated theoretically and numerically. The results indicate that the proposed method can be effective in the inverse Ising problem (or Boltzmann machine learning).  ( 2 min )

  • Open

    [D] Any guesstimates for how much a DALLE 2 generation will eventually cost?
    Just based on the estimated running costs of GPT3, and then whatever profit gets applied on top of that, are there any estimates for what openai will eventually charge for image generation? submitted by /u/EugeneJudo [link] [comments]  ( 1 min )
    Problem with CVPR template and arXiv? [D]
    I don't know what would be the best place to post this. But I am having trouble uploading an Overleaf manuscript to arXiv based on the CVPR 2022 template. I am getting the following error. Does anyone have any ideas? ​ https://preview.redd.it/y9jq0rbysds81.png?width=1614&format=png&auto=webp&s=16be3d32468837f649e846ec8a309dab2854c762 submitted by /u/avd4292 [link] [comments]
    [R]Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language - Google Apr 2022
    Paper: https://arxiv.org/abs/2204.00598 https://socraticmodels.github.io/ Twitter: https://twitter.com/andyzengtweets/status/1512089759497269251 Abstract: " Large foundation models can exhibit unique capabilities depending on the domain of data they are trained on. While these domains are generic, they may only barely overlap. For example, visual-language models (VLMs) are trained on Internet-scale image captions, but large language models (LMs) are further trained on Internet-scale text with no images (e.g. from spreadsheets, to SAT questions). As a result, these models store different forms of commonsense knowledge across different domains. In this work, we show that this model diversity is symbiotic, and can be leveraged to build AI systems with structured Socratic dialogue -- in whi…  ( 1 min )
    [D] What to do next after the sanity check?
    I have two years of time-series data taken from two sensors which I have split into 80/10/10 non-overlapping train/val/test splits. The task is to denoise one sensor data into another and I am handling it as a regression problem. I am following this website and considered an already published model (5 convolutional and 1 fully connected layer) which is trained on a similar dataset and same task. For the sake of sanity check, as per the website, I have trained the model on a subset of trainset (3 months) and tried to overfit it (while evaluating on complete val set), which works fine. However, I am not sure what to do next from this point on? Shall I just train on the complete trainset now? Or do I increase the layers or play with other hyper params to find more details about my regression problem/data? I would really appreciate your comments. Thank you. PS. The target value is sparse i.e. more than 85% of the time it is zero. submitted by /u/muaz_usmani [link] [comments]  ( 1 min )
    [D] Leaving ML for Software Engineering?
    I'm keen to hear from people who have made the transition from ML Research/Engineering positions to software engineering roles (or who are considering it). What were your reasons for doing it and did you regret it? I see so many articles about transition from software to ML but none about going the opposite direction. Context: I've been working as an ML Engineer for a little over a year, and I'm just... not enjoying it. I want to love my job so badly as I like my boss, my colleagues, and the company (and I'm paid quite well for my level), but I just don't. I feel like the type of work I'm doing is not very smart and yet it's extremely draining: I spend so many hours just looking at loss curves, tweaking features and parameters. I'm somehow bored and stressed at the same time, because I don't enjoy the work and yet I feel the pressure to produce good models, and when they don't work as expected I can't help but take it personally as if if I just tried hard enough they would work. I find that the days were I end up having to take care of more purely engineering tasks I just have a lot more fun and I finish the day more satisfied and less drained. I think I just want to build something instead of spending hours banging my head against shit data. I would love to hear from people who feel or have felt the same way because whenever I speak about this with friends who are in ML they look at me like I'm a lunatic for wanting to leave it for software engineering. I'm obviously aware that swe roles are not all fun and games, but I just feel like there's been an excessive push for so many people to move to ML as it's "cool" and "smart" when in reality they're just different things who are going to suit different people. submitted by /u/hedy-m [link] [comments]  ( 5 min )
    [D] Is virtual ICLR 2022 worth paying for
    The 2022 ICLR conference at the end of this month is virtual and costs $100 to attend. I was thinking of attending for networking opportunities but I’m not sure. Is it a good idea to go for it? submitted by /u/sybar142857 [link] [comments]  ( 1 min )
    [D] Triplet vs. Contrastive Loss
    The online triplet mining strategy is more efficient than the offline one. It implies "getting a batch of n samples and their associated labels, and form triplets on the fly." Here is an article about Triplet vs. Contrastive Loss comparison and its efficient implementation. I would like to know your feedback. submitted by /u/devzaya [link] [comments]
    [P] Animated Character Generator
    Hello everybody, I'd like to share the latest machine learning project of mine. It allows one to generate animated characters in the style of old video game consoles. Here are some examples. I would appreciate any feedback. https://i.redd.it/sig6ilpi8bs81.gif https://i.redd.it/xaf906qi8bs81.gif https://i.redd.it/8v7lz2qi8bs81.gif submitted by /u/ie9res [link] [comments]
    [D] Annotation formats for image annotations?
    Hey ML people, what is your favorite annotation format for image bounding boxes/labels? I know coco is very popular, we are rethinking parts of our data infrastructure wondering what everyone is using. Our platform hosts hundreds of millions of images. Ideal format would support running queries on data stored in a Data lake If the format supports 3D annotation types that is even better. Thanks for your insights in advance. submitted by /u/mmuppidi [link] [comments]  ( 1 min )
    [N] OpenAI's DALL-E 2 paper "Hierarchical Text-Conditional Image Generation with CLIP Latents" has been updated with added section "Training details" (see Appendix C)
    New version of paper is linked to in the DALL-E 2 blog post and also here (pdf file format). Tweet announcing updated paper. Older version of paper (pdf file format). Original Reddit post. submitted by /u/Wiskkey [link] [comments]  ( 1 min )
    Dense Passage Retriever(DPR) Open-QA System [P]
    Hi, I made a video explaining Dense Passage Retriever(DPR) paper. We specifically explain the End to End QA system suggested in the latter part of the paper which discusses how to build an Open-QA system using dense retrievers. DPR was one of the first papers that discussed building dense retrievers using QA pairs only and didn't require a big pretraining computational setup like ORQA or REALM. It is currently used in a lot of places as a dense retriever. You can find Hugginface and Haystack implementations also. This video is part of a series on Open-QA using dense retrievers. We have made 2 videos on DPR. In the latter, we discuss how to build a dense retriever from scratch. Thanks for the support and it would be great if you could give any feedback. https://www.youtube.com/watch?v=rvcyyJNjPU0 submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
    [D] Works that can process variable input resolution of images
    Hi. I'm looking for existing computer vision papers /networks that can process variable input resolution. Can anyone share me with similar works? For example, a network/layer N can take both inputs with H*W and 2H*2H individually and give correct prediction. One of them I know is ROI pooling used in Faster RCNN. Thanks very much. submitted by /u/vincent341 [link] [comments]  ( 1 min )
    [D] Machine Learning Engineers - What Does Your Day Involve?
    Hey, I'm looking to transition from my current role as a data scientist to one that has a machine learning engineering focus. I was wondering if anyone could provide insights into how they plan their day, or what activities you do throughout the day/week. I'd be particularly interested to understand the balance between deploying models/writing production worthy code and your time spent learning/developing given the field is moving so fast. submitted by /u/MenArePigs69 [link] [comments]  ( 4 min )
    [D] Bayesian Non-Parametrics for Ranking?
    I am currently sitting at a difficult machine-learning problem that I have found no literature on how to solve it. I am given n datapoints x_1,...,x_n that are ordered according to a ranking preference rank(x_1)<rank(x_2)<...<rank(x_n). I am assuming there exists a function f, such, that f(x_i)<f(x_i+1). I am now searching a Bayesian non-parametric model that gives the posterior probability of functions f that abide f(x_i)<f(x_i+1), so that i can estimate the relative rank preferences at new points. I have tried out a few things. The naive approach is using a GP prior on f. Unfortunately, computing the posterior distribution p(f(x_1), ... f(x_n)| f(x_1)<...<f(x_n)) has no closed form solution (it is a normal distribution with N linear constraints, which is absolutely terrible to sample from). This makes computing conditional distributions for predictions very challenging. I am currently approximating the solution by using a GP regression model with label y_i = rank(x_i)=i. But this is systematically under-estimating the shape-variation, due to the fact that it adds the assumption that function values between ranks are equidistant. Is there any known approach how to do this? submitted by /u/Ulfgardleo [link] [comments]  ( 2 min )
    [R] Video Diffusion Models
    From the webpage: We present results on video generation using diffusion models. We propose an architecture for video diffusion models which is a natural extension of the standard image architecture. We show that this architecture is effective for jointly training from image and video data. To generate long and higher resolution videos we introduce a new conditioning technique that performs better than previously proposed methods. We present results on text-conditioned video generation and state-of-the-art results on an unconditional video generation benchmark. Paper: https://arxiv.org/abs/2204.03458 https://video-diffusion.github.io/ submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [D] how to decide publication venue
    How to decide if a paper is appropriate for a specific venue? Moreover, how would you categorize the difference between a good NiPs publication and a good CvPR or ICCV publication? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [N] LR Warmup for PyTorch
    ​ RadamWarmup + CosineAnnealingLR + StepLR Colab Link pytorch_warmup v0.1.0 was released. submitted by /u/TonyY_RIMCS [link] [comments]
  • Open

    RL for dynamic environments
    In their 2019 review article in Nature Machine Intelligence, Neftci and Averbeck point out, “Most work in biological systems has focused on simple learning problems… where flexibility and ongoing learning are important, similar to real-world learning problems. In contrast, most work in artificial agents has focused on learning a single complex problem in a static environment.” Are there RL approaches designed to handle dynamic environments with changing reward functions? I did find this earlier post, but thought I'd ask if anyone had other suggested lines of reading. Thanks! submitted by /u/Careless-Argument-37 [link] [comments]  ( 1 min )
    Observationspace max Size?
    I want to give my AI as many information as possible. Can there be an Issue with a too large observation space? submitted by /u/Willing-Classroom735 [link] [comments]  ( 1 min )
    "UC Berkeley’s Pieter Abbeel receives 2021 ACM Prize in Computing" (for DRL robotics)
    submitted by /u/gwern [link] [comments]
    Action Space Dimensional reduction for better convergence
    I am working on a project in which a robots learns its motion. for example Bipedal robot learns to walk on a straight line by learning to adjust the torque and angular velocity of each joint. However, the robot I am working on has complex architecture. It has 10 joints instead of 2, most importantly all of these joints work simultaneously and coherently to produce a desired motion. The Problem I am facing is that, the robot has ten joints and each joint can move between -450 to +450 for simplicity let me define State and Actions of the system State = -450 to +450 -------------> normalization ------------------> -10 to 10 Actions = choose an angle between -10 to 10 for each joint Total Action space for each State at each time step = 10 (no of joints each can move between -10 to 10 at each time step) * 360 (total time steps for a single motion) = 3600 (Output: No of angles required to generate a motion) I am using TD3 to solve this conundrum. The Action space is too large how can I reduce the action space? submitted by /u/SAM_Baloch [link] [comments]  ( 2 min )
    Dynamic action space in RL
    I am doing a project and there is a problem with dynamic action space A complete action space can be divided into four parts. In each state, the action to be selected is one of them For example, the total discrete action space length is 1000, which can be divided into four parts, [0:300], [301:500],[501:900],[901:1000] For state 1, action_ space is [0:300], State2, action_ space is [301:500], etc For this idea, I have several ideas at present: There is no restriction at all. The legal actions of all States are [1:1000], but it may take longer train time and there is not much innovation Soft constraint, for example, if state1 selects an illegal action, such as one action in [251: 500], reward gives a negative value, but it is also not innovative Hard constraint, use action space mask in each state, but I don't know how to do it.. Is there any relevant article? It is directly divided into four action spaces and uses multi-agent cooperative relationship learning ​ Any suggestions? thanks! submitted by /u/RangerWYR [link] [comments]  ( 2 min )
    Any paper suggestions??
    Hi everyone, i have to define a project for my master degree, so i'm looking for the best papers published since 2018-2019 until now in Reinforcement Learning . Do you have any suggestions, titles or projects that i can check? submitted by /u/acaviedes15 [link] [comments]  ( 1 min )
  • Open

    Responsible AI in a Global Context
    submitted by /u/john133435 [link] [comments]
    AI website that transitions photos into video?
    Remember using a website like a year ago, where you could put in 2 or more images, and it would sort of make a transition between the two with AI. Then you could export the video and such. You could also very extensively edit human faces and change small features on a scale from 1-100. The features where incredibly specific like brow bone and nasal bridge. If anyone has the website I would appreciate it!! submitted by /u/yungbenz0_bajs [link] [comments]  ( 1 min )
    How Artificial Intelligence Is Impacting Today’s Businesses
    submitted by /u/mr_j_b [link] [comments]
    Alibaba’s AI tool to improve efficiency of China’s waste-to-energy plants
    submitted by /u/mr_j_b [link] [comments]
    Best GAN for Tabular-data
    What in your opinion is the best GAN for tabular-data. Please include any references if you have any. submitted by /u/ily_jk [link] [comments]
    Supercharged UI for MLflow
    Hi guys, we've built a plugin that seamlessly reads MLflow logs and provides a beautiful UI to compare multiple runs with just a few clicks. You can: filter runs with a super versatile fully pythonic search group and aggregate your metrics / images We are trying make it work seamlessly with MLflow and complement its other awesome features 🎉 Here is more info about it https://aimstack.io/aimlflow Would love your feedback!! submitted by /u/ManeSa [link] [comments]  ( 1 min )
    Takeaways From 3 Years Working In Machine Learning
    submitted by /u/elcric_krej [link] [comments]
    OpenAI 's new model DALL·E 2 is amazing!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
    The AI in a jar
    submitted by /u/bendee983 [link] [comments]
    Can Computers Learn Common Sense?
    submitted by /u/estasfuera [link] [comments]
    Metaverse weekly digest: Shiba Inu’s metaverse, Alibaba’s $60 million VR investment
    submitted by /u/bent_out_of_shape_ [link] [comments]
    Meet ‘ChestLink’, The First Autonomous AI Medical Imaging Application by ‘Oxipit’ That Received CE Mark Approval in the EU
    ​ https://preview.redd.it/e2q3jit3c8s81.png?width=1024&format=png&auto=webp&s=837aa6256647df6fb8777a02b04313a38428f573 The most common diagnostic imaging test conducted in emergency rooms is chest radiography. Providing automated preliminary read helpers to physicians might speed up surgery, enhance accuracy, and lower healthcare costs. An artificial intelligence tool that interprets chest X-rays without the intervention of a radiologist received regulatory approval in the European Union this week, marking a first for a wholly autonomous medical imaging AI, according to ‘Oxipit‘, the developer of this tool. It’s a watershed moment for AI, and it’s more than likely to spark debate, given that radiologists have spent the last few years working to fully automate parts of their jobs. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
  • Open

    Large-Scale Matrix Factorization on TPUs
    Posted by Harsh Mehta, Software Engineer, Google Research Matrix factorization is one of the oldest, yet still widely used, techniques for learning how to recommend items such as songs or movies from user ratings. In its basic form, it approximates a large, sparse (i.e., mostly empty) matrix of user-item interactions with a product of two smaller, denser matrices representing learned item and user features. These dense matrices, in turn, can be used to recommend items to a user with which they haven't interacted before. Despite its algorithmic simplicity, matrix factorization can still achieve competitive performance in recommender benchmarks. Alternating least squares (ALS), and especially its implicit variation, is a fundamental algorithm to learn the parameters of matrix factorization…  ( 9 min )
  • Open

    Rock On: Scientists Use AI to Improve Sequestering Carbon Underground
    A team of scientists have created a new AI-based tool to help lock up greenhouse gases like CO2 in porous rock formations faster and more precisely than ever before. Carbon capture technology, also referred to as carbon sequestration, is a climate change mitigation method that redirects CO2 emitted from power plants back underground. While doing Read article > The post Rock On: Scientists Use AI to Improve Sequestering Carbon Underground appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Build a custom entity recognizer for PDF documents using Amazon Comprehend
    In many industries, it’s critical to extract custom entities from documents in a timely manner. This can be challenging. Insurance claims, for example, often contain dozens of important attributes (such as dates, names, locations, and reports) sprinkled across lengthy and dense documents. Manually scanning and extracting such information can be error-prone and time-consuming. Rule-based software […]  ( 7 min )
    Getting started with the Amazon Kendra Box connector
    Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. For many organizations, Box Content Cloud is a core part of their content storage and lifecycle management […]  ( 6 min )
  • Open

    Hilbert transform and Mathematica
    The Hilbert transform of a function f(t) is a function fH(x) defined [1] by The integral must be interpreted in the sense of the Cauchy principal value: The integrand is not absolutely integrable because of the singularity at x and so the value of the integral depends on how you handle the singularity. The Cauchy […] Hilbert transform and Mathematica first appeared on John D. Cook.  ( 2 min )
    Visual integration
    The plot below is of a meromorphic function f(z). That is, the function f(z) is analytic except possibly at poles, and the colors represent the phase angles, the values of θ if you write the function values in polar form. What is the value of the integral where C is the perimeter of the square? […] Visual integration first appeared on John D. Cook.  ( 2 min )
  • Open

    Dense Passage Retriever(DPR) Open-QA System
    Hi, I made a video explaining Dense Passage Retriever(DPR) paper. We specifically explain the End to End QA system suggested in the latter part of the paper which discusses how to build an Open-QA system using dense retrievers. DPR was one of the first papers that discussed building dense retrievers using QA pairs only and didn't require a big pretraining computational setup like ORQA or REALM. It is currently used in a lot of places as a dense retriever. You can find Hugginface and Haystack implementations also. This video is part of a series on Open-QA using dense retrievers. We have made 2 videos on DPR. In the latter, we discuss how to build a dense retriever from scratch. Thanks for the support and it would be great if you could give any feedback. https://www.youtube.com/watch?v=rvcyyJNjPU0 submitted by /u/infiniteakashe [link] [comments]  ( 1 min )
  • Open

    Artificial Intelligence: Benefits for Automation Testing
    AI has been making a lot of noise of late, especially in the context of software development. Of course, this topic is quite wide, but in this article, we shall focus our attention on AI-driven automation testing. Let us start with understanding what is AI and automation testing. Automation testing refers to the process of… Read More »Artificial Intelligence: Benefits for Automation Testing The post Artificial Intelligence: Benefits for Automation Testing appeared first on Data Science Central.  ( 3 min )

  • Open

    Attend the 2022 National Autonomous Vehicle Expo (April 16-17th)
    Interested in the future of autonomous vehicles? Want to know more about the impacts of this technology? Join us on April 16-17th at the 2022 National Autonomous Vehicle Expo to discover the engineering, ethics, and policymaking of this emerging technology. The virtual expo consists of speaker and workshop sessions led by industry-leading companies, such as NVIDIA, Waymo, and Motional, as well as distinguished programs/organizations like MIT Beaverworks and InspiritAI. You will also have the opportunity to compete in our hackathon, where you can win a variety of cool prizes! Even if you don't participate in the hackathon, there will be free merchandise and giveaways throughout the expo! To register and/or view more information about the event, head over to avexpo.org. For hackathon-specific registration, you can visit our devpost at https://autonomous-vehicle-expo.devpost.com/. Hope to see you all there! ​ https://preview.redd.it/qgfx3sv837s81.png?width=1080&format=png&auto=webp&s=ed19d68bdff274de188deaa8f4338c864943b508 https://preview.redd.it/a4kbsrv837s81.png?width=1080&format=png&auto=webp&s=6a19347b708d822d8dca226beb82cbdc73ffbb87 https://preview.redd.it/s9nghsv837s81.png?width=1080&format=png&auto=webp&s=cf32712b9985564b455be53e02fd00589725ad2c submitted by /u/avexpo22 [link] [comments]  ( 1 min )
    Resources about cognitive theories
    Hi! I am new to the community, and was wondering what y'all's favorite resources were to learn about cognitive theories and how they will shape future AI advancements. YouTube channels would be great. submitted by /u/Apprehensive-Candy97 [link] [comments]
    Andrew Yang & Yuval Noah Harari: Tech, Public Policy & the Future of Work
    submitted by /u/john133435 [link] [comments]
    AI News | AI News | Why AI Made 40,000 New Chemical Weapons Compounds in 6 Hours | Cancer Treatment AI Breakthrough
    submitted by /u/getrich_or_diemining [link] [comments]
    How to create a BOT for a existing game?
    I wanna create a bot for a game which basically is: get resources, craft itens, sell then. The problem is, some itens has different qualities, and I wanna automatize this process, to identify the good stuff to keep, and sell the bad stuff. What's the best way to do that? I work with desktop systems, so i'm not familiar with this kind of stuff, but I usually read about python and some frameworks, what do you guys recommend me to start? submitted by /u/AbbathDoom [link] [comments]  ( 1 min )
    DALL·E 2: A new AI system to create realistic images and art from natural language commands
    submitted by /u/alien128 [link] [comments]
    What are other "technology" fields that is good to learn while studying AI?
    Hello! What do you guys think are other "technology" fields that would be good to study with AI? It is okay as long as it is "tech." What would be the tech field that would be beneficial in the future? My goal is to make a self-aware AI (AGI). I was always fascinated about AI since my childhood, that's why I'm going to pursue this field. Also, I am currently studying Game Development to make a VR Game that hopefully will have humanlike AI in it. I have read a LOT of articles about the future of AI, and Cybersecurity keeps popping up because superintelligent AI needs to be CONTROLLED from hackers (based on the articles) otherwise it is over. What do you guys think would be the tech field that will bring the most changes in the future? submitted by /u/ThatOneEpicAstronaut [link] [comments]  ( 1 min )
    Does this artificial intelligence think like a human?
    submitted by /u/qptbook [link] [comments]
    Artificial intelligence Courses for Healthcare
    We keep on hearing about how artificial intelligence and machine learning is going to revolutionise Medicine. But what’s hype, and what’s realistic? And how can you get involved? The first step is to understand the technology - where it’s well-suited to healthcare (and where it isn’t). When it comes to health care, especially for life and death situations AI has made things very easy for us. However, it is still expected to drastically change the way medicine is practised. It will also replace the surgeries done by the doctors with the surgeries done using Artificial intelligence, making diagnosing complex diseases, genetic issues and many other health problems extremely easy in the future. Here are the best Artificial Intelligence courses for healthcare you can learn in 2022. submitted by /u/maneesh123456 [link] [comments]  ( 1 min )
    Artificial Nightmares: Smithing Stone 6 || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    Introducing MindSpore 1.6 New Features
    submitted by /u/Creative_Habit_6868 [link] [comments]
    OpenAI's DALL·E 2 ! Text-to-Image Generation Explained
    submitted by /u/OnlyProggingForFun [link] [comments]
    How do I get into the field of AI policy and strategy?
    I've read online that a career in AI policy and strategy is heavily needed and is actually ranked as the number one problem in the future by 80,000 hours. I am choosing which undergraduate degree to pursue in the fall and I'm not sure the best pathway to pursue to work in this field in an extremely high level position. an economics degree? Computer science degree? AI degree? should I pursue one subject until I get a PhD in it or mix with other degrees/certificates? is it a straight forward pathway focused on one subject where I only work in one subject field or is it necessary to pursue and work in other fields as well, what are the typical steps? Also if there is anything else that would be helpful on the pathway or anything you would recommend please let me know. submitted by /u/Key-Lawyer-7586 [link] [comments]  ( 2 min )
    Five Google Chrome Extensions that every Machine Learning / Data Science professional should know about 🚀💯
    submitted by /u/MLtinkerer [link] [comments]  ( 1 min )
  • Open

    Implementation of RL
    Hi all! I am a beginner in RL field and am trying to implement the RL algorithm in the following paper : [1912.04321] Learning to Code: Coded Caching via Deep Reinforcement Learning (arxiv.org) In short, we are trying to achieve the minimum number of transmission of bits from the server to all users Now, after 500 episodes of training the number of transmissions does decrease. But when I implement the same actor critic algorithm this does not happen. In fact, the results seem to be completely random. Here is the plot of the same : https://preview.redd.it/yve3cay7l6s81.png?width=745&format=png&auto=webp&s=36bf4f0aa813a30024128d797acde3b8adc2df30 Although my training parameters are slightly different, I can't understand why this would happen. I used the parameters and pseudo code from this paper: A Deep Reinforcement Learning Approach for Shared Caching | IEEE Conference Publication | IEEE Xplore - which is an extension of the link at the top. ​ Attaching link to my code: https://www.kaggle.com/samarthtiwari123/rl-for-coded-caching Any help would be helpful!! Thanks in advance submitted by /u/samt_123 [link] [comments]  ( 1 min )
    How can I extract the direction a specific agent is facing?
    submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Flow of gradients through multiple classes
    Naive question: if I define my RL model as a combination of different classes (one class that preprocesses the observation, one class that processes the observation, one class that outputs the actions, etc.), is this going to affect the flow of gradients in PyTorch? The alternative would be to create only one class in which I combine everything submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Can KL divergence be used as a metric to see the learning progress in PPO?
    In the hyper parameter section of the paper, it is written that step size of Adam is varied according to KL divergence. So I wanted to know is KL divergence the correct metric to be used for observing the learning progress because we have many states for which probabilites of a particular action is either increased or decreased thus taking average KL mixes up a lot of things. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Weight decay in policy network for Discrete SAC?
    We’re finding that our network is returning a tensor of NaNs towards the end of training. Adding weight decay solves this issue but reduces learning, was wondering if anyone else had experience with vanishing gradients in off-policy methods or any insight? submitted by /u/TerrificJam [link] [comments]  ( 1 min )
  • Open

    [D] self attention visualization
    Has anyone ever come across seemingly chaotic self attention maps during visualization. If your model is performing well but no insights can be gleaned from the visualization how do you explain it in a paper? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [N] PaLM's (Google's 530B LLM) training costs around $9M to $17M.
    Here's the blogpost estimating the cost. What would it cost you to train PaLM using cloud computing (and you're not Google)? Something around $9M to $17M. submitted by /u/cirqe [link] [comments]  ( 1 min )
    [D] Feature selection methods
    I'm working on a ML project and I'm working with a dataset with 20 columns, for feature selection I just removed one column one by one and looked at the error of the ML outputs for each, then saw when what column is removed gives a lower error and kept repeating that but that didn't seem to help the model at all and the error went down very little. Is this an okay way of doing feature selection is there another way that gives better results. I tried PCA and LDA and Pearson Correlation method as well in Python and that didn't seem to help or is this the best I could do. Thanks! submitted by /u/ihshosv [link] [comments]  ( 1 min )
    [D] Overfitting a sign high learning capacity?
    This is a two part question: If a neural network can overfit the a large dataset is this a sign that a neural network has high learning capacity? If a neural network can overfit a dataset with substantially less parameters than other neural networks developed for the same learning task is this a sign that the neural network has a high learning capacity relative to other datasets? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
    [D] training cnn with synthetic data. Should i mix synth and real and train from the scratch or pretrain the network with synth and finetune with real?
    I'm doing research on the use of synthetic data for a computer vision task and i have generally always tried to train in a mixed setting from scratch, but i have noticed that in similar papers, researchers always pretrain on synth first and then finetune on real data. Is there a logic behind that? Should i expect better results by finetuning? submitted by /u/TheManveru [link] [comments]  ( 2 min )
    [R] Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data
    Hi Everyone, We have recently published the code for our AISTATS 2022 paper - Sampling in Dirichlet Process Mixture Models for Clustering Streaming Data ​ Video Segmentation Example In our work, we have proposed a solution for clustering streaming data. Unlike 'standard' clustering scenarios, in the streaming case the data stream is possibly infinite, you cannot backtrack to previously processed points, and the data statistics are dynamic and change over time. Our solution is based on the Dirichlet Process Mixture Model (DPMM), can work with different types of observations, and is very fast, outperforming other methods both in the quality of the results and the speed with which it achieves them. It can even be distributed across several processes and/or machines! Paper: https://dinarior.github.io/papers/Dinari_AISTATS_streaming.pdf Code (Julia Package): https://github.com/BGU-CS-VIL/DPMMSubClustersStreaming.jl Code (Python wrapper): https://github.com/BGU-CS-VIL/dpmmpythonStreaming Notebook (Julia) for creating the video: https://nbviewer.org/github/BGU-CS-VIL/DPMMSubClustersStreaming.jl/blob/main/examples/VideoSeg.ipynb submitted by /u/dinarior [link] [comments]  ( 1 min )
    [D] Best way to handle encoding disconnected graphs at the graph level.
    I am thinking of building a graph classifier that takes in graphs and labels the incoming graph. The dataset of interest to me is RadGraph: https://arxiv.org/abs/2106.14463 The issue I am having is that the graphs in RadGraph are disconnected in nature (on average 20 disconnected components), making it difficult for the various graph encoders I am aware of to do a good job classifying the graphs. submitted by /u/AICoderGamer [link] [comments]  ( 1 min )
    [D] Does someone know how much faster deepspeed's transformer implementation is?
    Implementation here Looks like they manually calculate the gradient? I'm very curious how much of a difference this makes! submitted by /u/fasttosmile [link] [comments]  ( 1 min )
    [R] My research group is publicly sharing its paper presentations! Check it out!
    https://outsystems-ai-reading-group.github.io/ submitted by /u/JClub [link] [comments]
    [R] On-the-fly Strategy Adaptation for ad-hoc Agent Coordination
    submitted by /u/hardmaru [link] [comments]
    [D] Any good free to use DALL-E style datasets?
    Are there any free to use datasets that contain image/annotation pairs in the style OpenAI used to train the DALL-E models? Pretty inspired by DALL-E 2 and think it would be cool to create a tiny less powerful replication submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [D] TensorFlow tf.range() vs range()
    TLDR: TensorFlow AutoGraph unwraps native Python ranges, baking each value into the graph. This can be an unexpected cause of graph size explosion. This recently caused an issue in my project, so I thought I'd share some more details: https://lukewood.xyz/blog/to-unroll-or-to-not-unroll submitted by /u/puppet_pals [link] [comments]  ( 1 min )
    [R] A benchmarking framework for time-series unsupervised domain adaptation
    Our work "AdaTime: A Systematic Evaluation of Domain Adaptation Algorithms on Time Series Data" is now public. We provide a benchmarking framework named "AdaTime" to fairly evaluate Unsupervised domain adaptation (UDA) approaches on time-series data. We find that UDA approaches proposed for visual data can be directly applied to time-series data, and still achieve excellent performance, even better than methods specially proposed for time-series UDA. Se were impressed by the consistently superior performance of "DIRT-T" method on all the datasets. We provide the code publicly on github https://github.com/emadeldeen24/AdaTime submitted by /u/emad_eldeen [link] [comments]  ( 1 min )
    [P][R] Announcing: Dataset & Denoising Shabby Pages Competition
    Into machine learning? Want a chance to earn a new MacBook Pro? Check out the Denoising ShabbyPages competition! The ShabbyPages dataset is being produced as a way to help train, test, and calibrate computer vision machine learning algorithms designed for working with documents. Enter the competition by training a model to remove the noise, and be awarded a MacBook Pro or some swag in the process! Check out the short paper introducing the dataset, and learn more about the competition at denoising-shabby.com. submitted by /u/proofconstruct [link] [comments]  ( 1 min )
    [R] FaceSigns: Semi-Fragile Neural Watermarks for Media Authentication and Countering Deepfakes
    Hi Everyone! We have released the preprint and google colab demo for our paper FaceSigns. FaceSigns embeds a secret bit-string as a semi-fragile watermark in the image pixels. The message is recoverable if benign image operations such as color/contrast adjustment, JPEG compression, Instagram filters are applied. However, the message cannot be decoded if the image is facially tampered (eg. DeepFake manipulation) . This selective fragility allows reliable detection of DeepFake manipulations applied on images signed using FaceSigns. Try out our google colab demo to see message encoding and decoding using FaceSigns! Paper: https://arxiv.org/abs/2204.01960 Project Webpage: https://shehzeen.github.io/facesigns Demo: https://github.com/paarthneekhara/FaceSignsDemo submitted by /u/LynxCompetitive7637 [link] [comments]  ( 1 min )
    [D] Machine learning models / ideas for Google search ads?
    Hi guys! I work in house and I’m part of our Google search team. Our ad spend is pretty large (9 figures per year, in USD). We build/manage stuff at scale using SQL, R, Javascript, and so on. So everything is pretty much “big data” in flavour. Lately I’ve been more and more interested in data science, and I’m looking to take things to the next level by incorporating machine learning into our workflow. I’d really love to build some useful machine learning models using popular Python libraries such as Pandas, SciKit Learn, NumPy, TensorFlow, PyTorch, and so on. Any suggestions on cool, and most importantly useful machine learning models I could build? (By “useful”, I mean something that could help increase the profits.) I think some classification, predictive, or recommender models would be great to start with. Cheers! 😄 submitted by /u/TropicalBound111 [link] [comments]  ( 2 min )
  • Open

    VDTTS: Visually-Driven Text-To-Speech
    Posted by Tal Remez, Software Engineer, Google Research and Micheal Hassid, Software Engineer Intern, Google Research Recent years have seen a tremendous increase in the creation and serving of video content to users across the world in a variety of languages and over numerous platforms. The process of creating high quality content can include several stages from video capturing and captioning to video and audio editing. In some cases dialogue is re-recorded (referred to as dialog replacement, post-sync or dubbing) in a studio in order to achieve high quality and replace original audio that might have been recorded in noisy conditions. However, the dialog replacement process can be difficult and tedious because the newly recorded audio needs to be well synced with the video, requiring …  ( 7 min )
  • Open

    Data Observability: Cracking the Code
    ‍What is the shortest distance between two points? A straight line of course. What if there are multiple points? Then, it depends.  A job executed in response to a user action – refreshing a dashboard, aggregating data, building a report, developing an ML algorithm, performing analytics – all require multiple hops through the data ecosystem.… Read More »Data Observability: Cracking the Code The post Data Observability: Cracking the Code appeared first on Data Science Central.  ( 5 min )
  • Open

    NN from Scratch: #2 Initializing parameters | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
  • Open

    Regular expressions and successive approximation
    Regular expressions can do a lot of tasks in practice that they cannot do in theory. That’s because a particular application of regular expressions comes with context and with error tolerance. For example, much has been said about how regular expressions cannot parse HTML. This is strictly true, but it says nothing about how well […] Regular expressions and successive approximation first appeared on John D. Cook.  ( 3 min )
  • Open

    Try This Out: GFN Thursday Delivers Instant-Play Game Demos on GeForce NOW
    GeForce NOW is about bringing new experiences to gamers. This GFN Thursday introduces game demos to GeForce NOW. Members can now try out some of the hit games streaming on the service before purchasing the full PC version — including some finalists from the 2021 Epic MegaJam. Plus, look for six games ready to stream Read article > The post Try This Out: GFN Thursday Delivers Instant-Play Game Demos on GeForce NOW appeared first on NVIDIA Blog.  ( 3 min )

  • Open

    Question about Model Predictive Control (MPC) cost function
    To my understanding, the cost function is the error between predicted state value and real state value. So if I use a neural network as my dynamics model(unknown true dynamics), the MPC cost function is equivalent to NN’s loss function? submitted by /u/Blasphemer666 [link] [comments]  ( 1 min )
    Learning To Play "Settlers of Catan" With Deep RL - code and write-up
    submitted by /u/henrythepaw [link] [comments]  ( 1 min )
    Which environments do you use for benchmarking?
    Hey guys I'm curious which environments you use to benchmark your standard RL algorithms. I typically use some environments from the OpenAI Gym or the DM control suite but benchmarking all my implementations against all environments for multiple seeds would take forever. Are there some of their environments you particularly like for benchmarking? submitted by /u/NiconiusX [link] [comments]  ( 1 min )
    How does advantage estimation is done when episodes are of variable length in PPO?
    In the PPO paper it is stated that we have to collect trajectories of length T from N different workers. Suppose I am not using multiple workers then I have to collect episodes N times of fixed length T. But these episode lengths are variable i.e. some episodes end much before T and some much after T. So my question is how do we calculated advantage because according to the PPO paper, for generalized advantage estimate, we have to observe the reward of terminal state. ​ So how should I calculate GAE in this ? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    A silly question from a new beginner
    I could not find an answer to a question hanging around in my head for a while. Suppose we have some data, if we build up an MDP to capture actions + state dynamics. Would the optimal policy win state-of-art RL algorithms? Edit: If that is the case, why would the community bothers with learning algorithm since finding the model of dynamics is the key? ​ submitted by /u/musicinthedark [link] [comments]  ( 2 min )
    What does it look like the output of a reinforcement learning agent/algorithm in practice?
    Hello, I am relatively new in the area of machine learning/reinforcement learning. I have this basic question regarding practical implementations. I just want to know, what does it look like the output of a reinforcement learning agent/algorithm in practice? Is it like a 'look-up table' that will set the weights/parameters of the ML model based on the input data? Note that I am asking after the offline training of the agent. How to implement the trained agent in practice, like in an embedded system? Do you guys have references or clues to help me to clarify? BR submitted by /u/b0bzera [link] [comments]  ( 1 min )
    multi-discrete action space in SAC(Soft Actor-Critic)
    Hello! I am using SAC(Soft Actor-Critic) to complete a reinforcement learning task with only four steps, each action is from one of four different action spaces. These four action spaces are essentially the same, and they are all chemical compound. I just want the agent to take different types of compounds at each step. I have the following questions: Whether the different four steps can be trained? In fact, there is a paper that only has four steps in a reinforcement learning process, but his action space only has one discrete action space. Is there any article I can learn from? because I know that for the handle control of the game, there are usually multiple discrete action spaces, but each discrete space dimension of my task is larger, such as [800, 700, 500, 600] Thanks! submitted by /u/RangerWYR [link] [comments]  ( 1 min )
    Is multi bandit on policy or off policy?
    quick question submitted by /u/Asleep_Donut1382 [link] [comments]
    How wrong is it to use sampling at inference time ?
    At my company we use RL to solve our problem. The thing is : our problem is rather complex, and this is the core of our product (so clients rely on the results produced). In order to reach satisfying results despite an agent that doesn't learn very well, we use sampling at inference time : instead of taking the best trajectory according to the agent, we take X trajectories and keep only the one with the best reward. This seems completely fine at first (similar things are done in NLP for example, with beam search), but in our case the sampling size is huge : 1024. Usually when using beam search, we use maybe a beam size of 6. Maybe 10 if you have good hardware ? ​ Now, the agent seems to be learning : the mean return is slightly increasing over time, the entropies for the actions are steadily decreasing, etc... Now the goal of the ML team is to improve agent's learning to decrease the sampling size at inference time (because it's costly to run 1024 trajectories through the environment...). But whatever we try, the improvements are not reflected (we compare all our experiments with 1024 sampling in order to see what the customers will see). ​ IMO this is because our sampling size is way too huge, even a random agent can produce okay-ish results... Is my intuition the right one ? submitted by /u/dummy-gummy [link] [comments]  ( 2 min )
    Does a master's thesis/doctoral dissertation need to have implications down the line for it to be good?
    I'm in the last semester of my undergraduate degree; over the past couple of weeks, I've been trying to brainstorm ideas that I would like to pursue in my graduate research career. I'm interested in the emergence of language in multi-agent reinforcement environments but I can't see how this would be important down the line when there are large language models that are completely dominating language and communication. Should this stop me from pursuing this idea or should I let my interest in the idea take precedence? submitted by /u/clarky103 [link] [comments]  ( 1 min )
    How long would it take you to implement a MARL PPO agent with joint attention architecture?
    Out of curiosity, how long would it take to implement a paper like this one? https://arxiv.org/abs/2104.07750 It has PPO agents in MARL, all of them with multihead attention performed on the observation, in such a way that an attention map is created for each agent. This attention map has information about how strongly each agent is attending to various elements of the environment. With KL divergence, the agents are rewarded for minimizing the difference between their attention maps. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
  • Open

    [R] Using Gamma Distribution to Improve Long-Tail Event Predictions at Doordash
    Predicting longtail events can be one of the more challenging ML tasks. Last year my team published a blog article where we improved DoorDash’s ETA predictions by 10% by tweaking the loss function with historical and real-time features. I thought members of the community would be interested in learning how we improved the model even more by using Gamma distribution-based inverse sampling approach to loss function tunning. Please check out the new article for all the technical details and let us know your feedback on our approach. https://doordash.engineering/2022/04/06/using-gamma-distribution-to-improve-long-tail-event-predictions/ submitted by /u/pmp-dash1 [link] [comments]  ( 1 min )
    [D] ICML author response. What reviewers expect.
    Hi, we submitted to ICML for the first time. We got 4 reviews and 3 of them are mostly positive. Major comments by the reviewers include: more justification on the assumptions, discussion on choices of parameters, and experiments in more complex and different environments. We want to address all the major and minor comments as best as we can but given that the response is limited to one page we cannot explain everything in detail. I am not sure what is the acceptable norm here. Do reviewers expect the authors to conduct some experiments during the rebuttal and provide sample results or just explain what additional experiment we will conduct and how we will do it. Justification and reasoning should be in details or a brief explanation with an assurance to add a detailed discussion in the final version suffices. TIA submitted by /u/srvsinha186 [link] [comments]  ( 1 min )
    [D] Questions for a TPM for ML interview at Google
    Hey all, I have a technical program manager interview soon for an ML team at google and I want to know if anyone has any sample role-related questions I can gauge myself with. I have a strong data science & statistics background but that doesn't always translate to deep ML knowledge like an ML Engineer might have. Any resources or sample questions? I have not found adequate results from google regarding this team area specifically. submitted by /u/math_is_my_religion [link] [comments]  ( 1 min )
    [D] Anyone knows any high accuracy models on UCI adult dataset?
    Hi everyone. This is my first-time post here, and I hope I did not break any sub rules. Currently, I am doing some research with the UCI Adult dataset(https://archive.ics.uci.edu/ml/datasets/adult). This first step is to build a high-accuracy classifier model. Does anyone know any high accuracy model on this dataset (more than 90%)? I use many machine learning models like logistic regression and neural network. But no matter how complex the model is, I can only get an accuracy of about 85% on the test set. I tried to google but I found many others also have similar results of about 85%. Any posts or papers will be helpful! Thanks in advance for your help! submitted by /u/Akasakura888 [link] [comments]  ( 1 min )
    [R] Using Gamma Distribution to Improve Long-Tail Event Predictions
    Predicting longtail events can be one of the more challenging ML tasks. Last year my team published a blog article where we improved DoorDash’s ETA predictions by 10% by tweaking the loss function with historical and real-time features. I thought members of the community would be interested in learning how we improved the model even more by using Gamma distribution-based inverse sampling approach to loss function tuning. Please check out the new article for all the technical details and let us know your feedback on our approach. ​ https://doordash.engineering/2022/04/06/using-gamma-distribution-to-improve-long-tail-event-predictions/ submitted by /u/pmp-dash1 [link] [comments]  ( 1 min )
    [D] Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable
    Hey there, just a heads up we at The Gradient just published a new article discussing explainability - "This article uses the common backdrop of competitive games to explore the ways in which domain experts adapt to new technologies that lack explainability. I illustrate how interpretations vary based on user experience and model architecture, and how special care must be taken when adapting models to human-centric problems." Check it out here if you think it's interesting / worth discussing: Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable submitted by /u/regalalgorithm [link] [comments]  ( 1 min )
    [Project] Learning to Play "Settlers of Catan" With Deep RL - Writeup and Code
    Hi all, I just wanted to share a project I've been working on for the past year - using deep RL to learn to play the board game Settlers of Catan. I expect everyone is aware of the results that DeepMind/OpenAI have got recently on Go, DOTA 2, Starcraft 2 etc, but I was motivated to see how much progress could be made with existing RL techniques on a reasonably complex game - but with access to significantly less computational resources. Whilst I didn't end up with an agent that performs at a super-human level, there was clear learning progress and the results were quite interesting. I decided to do a full write-up of the project here, which I figured could be useful for anyone else who is interested in trying to apply DRL to a new, complicated environment. I also open-sourced all the code here for anyone interested. If anyone has any feedback or any questions at all that'd be great! submitted by /u/henrythepaw [link] [comments]  ( 3 min )
    [D] ICML rebuttals optional or semi-mandatory?
    Hi, We just submitted to ICML 2022 and got our reviews back. We were excited to see that 4/4 reviews were positive and acknowledged the contribution of the paper. However, there were some minor criticisms (e.g. didn't do good enough lit reviews, could use a few more experiments) across several reviews. I was wondering if it is ever acceptable to not submit a rebuttal? Can a rebuttal in this case actually hurt us by rocking the boat---or for ICML is the norm that you should always submit a rebuttal that addresses all the reviewers' criticisms. We were wondering what the norm is for ICML specifically? submitted by /u/optimistic313 [link] [comments]  ( 1 min )
    [R] Hierarchical Text-Conditional Image Generation with CLIP Latents. This is the paper for OpenAI's DALL-E 2
    Blog post. Paper (pdf file format). The paper is also linked to in the above blog post. Abstract Contrastive models like CLIP have been shown to learn robust representations of images that capture both semantics and style. To leverage these representations for image generation, we propose a two-stage model: a prior that generates a CLIP image embedding given a text caption, and a decoder that generates an image conditioned on the image embedding. We show that explicitly generating image representations improves image diversity with minimal loss in photorealism and caption similarity. Our decoders conditioned on image representations can also produce variations of an image that preserve both its semantics and style, while varying the non-essential details absent from the image representation. We use diffusion models for the decoder and experiment with both autoregressive and diffusion models for the prior, finding that the latter are computationally more efficient and produce higher-quality samples. OpenAI's Sam Altman used DALL-E 2 to generate ~20 text prompt requests from Twitter users. The results are here, with individual result links and other samples in this comment from another Reddit user in a different post. Twitter thread about the paper (not from the paper authors). Sam Altman's blog post about DALL-E 2. Hopefully this summer, we’ll do a product launch and people will be able to use it for all sorts of things. submitted by /u/Wiskkey [link] [comments]  ( 3 min )
    Is the 'first boss attempt' phenomenon know to occur amongst NN playing games, or is this learning trajectory unique to human players?[D]
    I'm curious about wether this unusual learning trajectory observed in humans has also been observed in artificial neural nets. A well known phenomenon in the 'dark souls' video game series is that ones first attempt at a boss is often much better than subsequent attempts. Boss hp at time of death by attempt might go something like: 35%, 55%, 85%, 87%, 75%, 54%, , 60%, 43%, 27%, 38%, 12%, 0%. This sounds very anecdotal, but its know to the community of these games to be a real thing. See this thread for evidence. Have NN playing games been known to exhibit a similar pattern, with peak in success early on, followed by a step descent , then a slow gradual climb? Or is this a purely human phenomenon? My hypothesis as to why this happens is that over the course of the first couple attempts, the player learns a bunch of bad strategies which must be slowly unlearned, whereas on attempt one, the player has no defined strategies good or bad. submitted by /u/Greenface1998 [link] [comments]  ( 3 min )
    [R] Disentangling Abstraction from Statistical Pattern Matching in Human and Machine Learning
    submitted by /u/papajan18 [link] [comments]
    [Project][P] Who invented Graph Neural Networks?
    Just a side project (only for me) in which I try to sum up some history of DL. Can't be 100% sure this is the first article in which they appear: Scarselli, F., Gori, M., Tsoi, A. C., Hagenbuchner, M., & Monfardini, G. (2008). The graph neural network model. IEEE transactions on neural networks, 20(1), 61-80. Would appreciate any help. Thanks submitted by /u/Siddh__ [link] [comments]  ( 2 min )
    [R] GP-BART: a novel Bayesian additive regression trees approach using Gaussian processes
    (not my paper) paper: https://arxiv.org/abs/2204.02112 abstract: "The Bayesian additive regression trees (BART) model is an ensemble method extensively and successfully used in regression tasks due to its consistently strong predictive performance and its ability to quantify uncertainty. BART combines "weak" tree models through a set of shrinkage priors, whereby each tree explains a small portion of the variability in the data. However, the lack of smoothness and the absence of a covariance structure over the observations in standard BART can yield poor performance in cases where such assumptions would be necessary. We propose Gaussian processes Bayesian additive regression trees (GP-BART) as an extension of BART which assumes Gaussian process (GP) priors for the predictions of each terminal node among all trees. We illustrate our model on simulated and real data and compare its performance to traditional modelling approaches, outperforming them in many scenarios. An implementation of our method is available in the \textsf{R} package \texttt{rGPBART} available at: https://github.com/MateusMaiaDS/gpbart." submitted by /u/bikeskata [link] [comments]  ( 2 min )
    [D] Anyone know about any interesting recent improvements with SNNs?
    I’m currently writing a research paper for my MSc on neuromorphic sensing and spike neural networks and most good papers are from around 2015 and was looking for something more recent. Anyone here heard of any interesting upgrades in architecture or applications? Cheers! submitted by /u/GandhisLittleHelper [link] [comments]
    Noticing that profs focus on male student’s goals and female student’s capabilities, any weigh-in? [D]
    Hello, I’m currently a graduate student. I do different projects and for some I get to decide on what I want the scope to be. I do have to get the scope/ plan/ idea approved first. I pitch my ideas to profs who aren’t directly my profs and normally 5-6 other students will pitch ideas to the same group of profs at the same time….. I noticed that i get really different questions and feedback in comparison to my peers. I’m a female and my peers are male… I didn’t start out with this outlook but I’m starting to search for reasons why I often get questioned about my capabilities to preform a project ( which is normal enough but I get questioned to the point where explaining my approach isn’t enough and they ask me for examples of codes) and my peers definitely do not get asked about there capabilities, rather they tell them what they can do and they don’t get questioned. …………… really frustrating. submitted by /u/tyger-lily [link] [comments]  ( 4 min )
    [D] How to write a ML+Healthcare paper where the research was a framework with pre-trained models
    As a project in the course of my PhD, I had to create a prototype for a project. My PhD is application of machine learning in health care. The project definition and scope was faaaar too wide. However, I managed to create a working demo which encompasses some use cases of the project. At best, it can be called a framework, where I have put in different DL components and it works okay for those use cases only. Most of the components, I have used are pre-trained language models (maybe fine tuned them to my use case). However, there is no active training or learning involved. This is because I created this for a demo only. I also created a very small dataset and tested the framework over the dataset and the results were ok. However, my supervisor now wants me to write a paper, as he is confident, that the use case is rather unique and my working framework is a good first step. I believe, his aim is to get me started on the paper writing process, which I appreciate. However, I am not confident about it at all. My question is, can a 'framework' composed of pre-trained models with the end goal of solving a problem in health care is good enough? Are there precedents of any such paper? And if I trust my supervisor's instincts, are there any fancy ways to frame the framework so that it does not look so basic? submitted by /u/Complex_State9960 [link] [comments]  ( 2 min )
    [P] Building a knowledge based recommender system
    I am trying to build a knowledge based recommender system but do not have prior knowledge. We first take in user inputs such as occasion, weather, top wear and bottom wear, color. Based on this we want to create a knowledge base and recommend clothes. Can anyone help me on how to go about on doing this process step by step and what algorithms and technology to be used? submitted by /u/bills70 [link] [comments]  ( 1 min )
    [D] ICML 2022 Paper Reviews
    ICML 2022 paper reviews are supposed to be released soon. Creating a discussion thread for this year's reviews. submitted by /u/zy415 [link] [comments]  ( 2 min )
    [D] In general, should you let the model find interactions between many basic features, or should you use feature engineering to ‘help’ the model find the interaction?
    I’ll give an example to better explain my question (don’t get hung up on the numbers, it’s all made up). Say you are using a tree based model trying to project how many points a player will score in a given basketball game. Most players shoot free throws at a slightly lower percentage on the road, than they do at home. However, the magnitude varies player to player. Let’s assume for 95% of players with significant data, the ratio of home free throw percentage to away is 1 to 1.15. Generally speaking, older players are closer to 1 and younger players are around 1.1 (since older players get used to the opposing crowd). Now also say it takes 100 home and 100 away free throws to get a stable reliable ratio. Now say a young player only has 50 home, and 50 away free throws. With this amount of data he has a ratio of 1, however the sample size is not enough to be fully stable. Which would be better… 5 features into this model, his home away ratio, average ratio for players his age, home free throw count, and away free throw attempts. 1 feature. His ‘projected’ home away ratio, which is a weighted average of his ratio with the average for plaeyrs his age. Since he’s 50% of the way to significance, 0.5 * 1 + 0.5 * 1.1 = 1.05 The benefit of the of the first choice is that it may find other interactions that I never conceived of, however, it could incorporate noise. Is there a general consensus, or is this just a try both and see what works? submitted by /u/irndk10 [link] [comments]  ( 4 min )
  • Open

    Quick Little Keras Question
    I tried posting this on stackoverflow with no response. Im trying to use model.save() and keras.models.load_model() on this chunk of code. But, unlike some of the other keras examples I've played with, this one seems to crash. I'm super new to this, any Ideas why? I can post the error message if it helps. submitted by /u/HoneyBunchsOGoats [link] [comments]  ( 1 min )
    Driving a robot with a neural network - use case study
    submitted by /u/KamilBugnoKrk [link] [comments]
    Here's an intuitive explanation to Singular Value Decomposition. 👇
    submitted by /u/mr-minion [link] [comments]  ( 1 min )
  • Open

    Weekly China AI News: Slime Robot Grabs Swallowed Objects; SenseTime Revenue Grows Despite $2.7B Net Loss; Transformer Architecture Search Without Training
    submitted by /u/trcytony [link] [comments]
    Reading the Tea Leaves: Expert End-Users Explaining the Unexplainable
    submitted by /u/regalalgorithm [link] [comments]
    How do we know that A.I hasn't already taken over our worlds ? How do we know this isn't the matrix ? #simulation
    submitted by /u/Individual-Fly-610 [link] [comments]  ( 2 min )
    DALL·E 2
    submitted by /u/roblox22y [link] [comments]
    Learn how GANs work with a cool Toonify example!
    submitted by /u/OnlyProggingForFun [link] [comments]
    Artificial Intelligence, Machine Learning and the Higgs boson - Live talk with Dr. David Rousseau
    submitted by /u/aair_x [link] [comments]
    What are your thoughts about AI teachers?
    submitted by /u/curiosityVeil [link] [comments]  ( 1 min )
    Here's an intuitive explanation to Singular Value Decomposition. 👇
    submitted by /u/mr-minion [link] [comments]
    Artificial Nightmares: Beauty Parlor || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
  • Open

    Fast and Luxurious: The Intelligent NIO ET7 EV Built on NVIDIA DRIVE Orin Arrives
    Meet the electric vehicle that’s quick-witted and fully outfitted. Last week, NIO began deliveries of its highly anticipated ET7 fully electric vehicle, in Hefei, China. The full-size luxury sedan is the first production vehicle built on the NIO Adam supercomputer, powered by four NVIDIA DRIVE Orin systems-on-a-chip (SoCs). The production launch of its flagship sedan Read article > The post Fast and Luxurious: The Intelligent NIO ET7 EV Built on NVIDIA DRIVE Orin Arrives appeared first on NVIDIA Blog.  ( 2 min )
    NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests
    In its debut in the industry MLPerf benchmarks, NVIDIA Orin, a low-power system-on-chip based on the NVIDIA Ampere architecture, set new records in AI inference, raising the bar in per-accelerator performance at the edge. Overall, NVIDIA with its partners continued to show the highest performance and broadest ecosystem for running all machine-learning workloads and scenarios Read article > The post NVIDIA Orin Leaps Ahead in Edge AI, Boosting Leadership in MLPerf Tests appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Receive notifications for image analysis with Amazon Rekognition Custom Labels and analyze predictions
    Amazon Rekognition Custom Labels is a fully managed computer vision service that allows developers to build custom models to classify and identify objects in images that are specific and unique to your business. Rekognition Custom Labels doesn’t require you to have any prior computer vision expertise. You can get started by simply uploading tens of […]  ( 7 min )
  • Open

    An optimized solution for face recognition
    When artificial intelligence is tasked with visually identifying objects and faces, it assigns specific components of its network to face recognition — just like the human brain.  ( 5 min )
    Does this artificial intelligence think like a human?
    A new technique compares the reasoning of a machine-learning model to that of a human, so the user can see patterns in the model’s behavior.  ( 7 min )
  • Open

    DSC Weekly Digest: Moving Time
    For nine years, my family and I have lived in a house in Issaquah, a little community about twenty minutes east of Seattle. The town still retains its charms — a downtown area about three blocks long that includes a vintage (and long since decommissioned) gas station, numerous restaurants, a live theater, the library, and… Read More »DSC Weekly Digest: Moving Time The post DSC Weekly Digest: Moving Time appeared first on Data Science Central.  ( 7 min )
  • Open

    Efficiently Initializing Reinforcement Learning With Prior Policies
    Posted by Ikechukwu Uchendu, AI Resident and Ted Xiao, Software Engineer, Robotics at Google Reinforcement learning (RL) can be used to train a policy to perform a task via trial and error, but a major challenge in RL is learning policies from scratch in environments with hard exploration challenges. For example, consider the setting depicted in the door-binary-v0 environment from the adroit manipulation suite, where an RL agent must control a hand in 3D space to open a door placed in front of it. An RL agent must control a hand in 3D space to open a door placed in front of it. The agent receives a reward signal only when the door is completely open. Since the agent receives no intermediary rewards, it cannot measure how close it is to completing the task, and so must explore …  ( 8 min )
  • Open

    DALL·E 2
    DALL·E 2 is a new AI system that can create realistic images and art from a description in natural language.  ( 2 min )
  • Open

    Sum the zeros of an analytic function without finding them first
    A couple days ago I wrote about how Vieta’s formulas let you sum the zeros of a polynomial without having to first compute the zeros. This is especially handy for high-order polynomials since there is no explicit formula for the zeros. Most functions that arise in applications are not polynomials. How could you find the […] Sum the zeros of an analytic function without finding them first first appeared on John D. Cook.  ( 3 min )

  • Open

    Last Week in AI: AI improves algae for biofuel and carbon capture, more AI decision-making in the military, and more!
    submitted by /u/regalalgorithm [link] [comments]
    Researchers From Allen Institute for AI Introduce ‘MERLOT Reserve’: A Novel Multimodal Video Question Answering Model
    We humans navigate the environment using all of our senses. Allen Institute researchers propose MERLOT Reserve, a model that learns to represent videos over time and across several modalities, including audio, subtitles, and video frames. It was trained using a new learning objective and more than 20 million YouTube videos. MERLOT Reserve is a unique, cutting-edge methodology for solving video-related inquiries. MERLOT Reserve can dependably choose the correct answer from a selection of multiple-choice answers when given a video and a question. This forecast is made by MERLOT Reserve jointly reasoning over the visual frames of the video, the video subtitles, and the audio in the movie. Continue reading this cool research update from AI2 Paper: https://arxiv.org/pdf/2201.02639.pdf Demo: https://merlot-reserve.apps.allenai.org/ Project: https://rowanzellers.com/merlotreserve/ Github: https://github.com/rowanz/merlot\_reserve ​ https://preview.redd.it/031i6ty6err81.png?width=1920&format=png&auto=webp&s=299569e12160eb991f35a2c6b41c5758ff027235 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    EndlessVN open alpha today
    submitted by /u/roblox22y [link] [comments]
    AI Meets Quantum Technology in New Google Spinoff, Sandbox AQ - News
    submitted by /u/allaboutcircuits [link] [comments]
    Best undergraduate major besides computer science for pursuing a career in artificial intelligence?
    Hi, all. I got accepted into my top choice of college as an undecided major. Recently, I have decided to pursue artificial intelligence! Unfortunately, it is near impossible to transfer into computer science at my particular university. I was wondering if I can still pursue AI as a career if I complete one of the following majors: -Mathematics -Information or Data Science -Statistics -Linguistics Additionally, I could pursue one of these and minor in another. I should be able to minor in computer science as well if necessary. Hopefully, my choice of major would allow me to pursue research or an internship in artificial intelligence. I am willing to take additional summer courses and pursue relevant certifications to ensure that I am up to par with my computer science colleagues. (posted on behalf of a family member) submitted by /u/runelagoon [link] [comments]  ( 2 min )
    Comparing old and new AI voices from Replica Studios (new in second half)
    submitted by /u/autumns [link] [comments]
    Super-res model/program comparison
    I upscaled an image with a few different superres models and programs, pick your favorite! https://files.botbox.dev/superrestestcollage.png Because of how reddit is, I can't make this as a poll, so comment your pick. Animated original version: https://www.youtube.com/watch?v=zRaTwVuqd70 (I will also make an animated version upscaled with the most voted model/program) submitted by /u/Recent_Coffee_2551 [link] [comments]
    solo voiceovers
    I am looking for something to change my voice in a way that is more satisfactory and more convincingly varied than what simple voice modulation software can achieve and as cheaply as is possible (preferably free). Use case: I have been working on an animated movie to which I am the sole contributor. Though I have been putting it off while looking for an appropriate solution, the time has come to voice my various characters, who are a range of ages, both male and female. For several reasons, I am interested in voicing them all myself while doing the facial motion captures as well. What I am in need of is, essentially, something that does exactly what Respeecher does, but without the $200/month sub fee. I would love to be in a position to simply pay them what they are asking for in exchange…  ( 2 min )
    Artificial Nightmares: Frenzied Flame || Clip Guided Diffusion AI Art Video [4K 20 FPS]
    submitted by /u/Thenamessd [link] [comments]
    AI that takes multiple songs as input, and then generates a similar song or song with similar elements?
    I have been searching for a music AI that takes input as mp3 or midi files, yet haven't been successful yet. Is there such a thing? If not, is such a thing feasible? submitted by /u/16pxl [link] [comments]  ( 1 min )
  • Open

    [D] Why aren't new LLMs using the Perceiver architecture?
    Perceiver and PerceiverIO (https://arxiv.org/abs/2107.14795) appear to offer significantly improved FLOP efficiency, but new LLMs (including Deepmind's own Gopher) don't use it. What gives? Is it still too new, or is the Perceiver architecture not appropriate for LLMs? submitted by /u/deeceeo [link] [comments]
    [R] Meta-Learning Machines in a Single Lifelong Trial: lecture video (24 min) presented at meta-learning workshops at ICML 2020 and NeurIPS 2021 (Schmidhuber YouTube Talk)
    Saw this posted on Schmidhuber's Twitter: Meta-Learning Machines in a Single Lifelong Trial: lecture video (24 min) presented at meta-learning workshops at ICML 2020 and NeurIPS 2021. URL of talk: https://youtu.be/2GgGVdkq2bU Abstract The most widely used machine learning algorithms were designed by humans and thus are hindered by our cognitive biases and limitations. Can we also construct meta-learning algorithms that can learn better learning algorithms so that our self-improving AIs have no limits other than those inherited from computability and physics? This question has been a main driver of my research since I wrote a thesis on it in 1987. In the past decade, it has become a driver of many other people's research as well. Here I summarize our work starting in 1994 on meta-reinforcement learning with self-modifying policies in a single lifelong trial, and - since 2003 - mathematically optimal meta-learning through the self-referential Gödel Machine. This talk was previously presented at meta-learning workshops at ICML 2020 and NeurIPS 2021. Many additional publications on meta-learning can be found at https://people.idsia.ch/~juergen/metalearning.html submitted by /u/hardmaru [link] [comments]  ( 1 min )
    [D] Hyperparameter Tuning: does it even work?
    Hi *, I've been working for the last 5 years as Data Scientist. During this time I have tried dozens of times to improve my models via hyperparameter tuning, but I've never got improvements from there. I've tried all the possible approaches: grid search, random search, bayesian search, etc. But in no case did I get satisfactory results. Does this happen to anyone else? Have you ever got robust improvements via HP tuning? submitted by /u/AM_DS [link] [comments]  ( 1 min )
    [D] Autoregressive model for graph generation?
    Autoregressive models like GPT-2 do fairly well in text generation. Is it possible to do the same for graph data? A transformer based model Graphormer has recently shown its effectiveness in graph representation learning. Is there any way I can train Graphormer or any other model to generate graphs from an initial graph context? submitted by /u/ratt_m [link] [comments]
    [D] How do you guys hear about the latest papers?
    Hi! I'm a first-year Grad Student in Computer Vision and I am trying to get caught up on the latest research in my field. It seems like everyone in CS has heard about all of the latest papers but I just have no idea how. My knowledge is limited to general ideas and doesn't know any specific papers unless they have like 20000+ citations. So my question is: how do you hear about these papers and get caught up? Is there a reference somewhere that puts together a list of all the "must-read" papers that have come out? I feel like I am already 5 years behind in my knowledge. It would be great if there was something like "Top 5 papers of the week" that I could read to stay on top of things. Also, this doesn't just apply to Vision. I would like to have an idea of the other major developments in other fields (like NLP, general ML/DL, etc.) since I think that can carry over to my field. Thanks! Looking forward to your replies submitted by /u/TobusFire [link] [comments]  ( 2 min )
    [D] Fake authors and paper riders
    Based on my experiences in both academia and industry, I see that many researchers get listed as authors on papers solely for having attended the relevant project meetings, despite not contributing anything substantial to the work. I know of several people who've gotten on dozens of papers this way, despite not being able to explain the main details behind many of the papers they "co-authored." Of course, they can then claim credit for the work publicly as well as have their academic profile benefit from the citations accrued by the work. I've noticed that typically, these people are initially invited onto the project because they are on chummy terms with someone on the project. Concerningly, the more someone successfully "paper-rides" this way, the stronger their publication record looks, which makes it easier for them to find their way onto more projects to paper ride in the future. It seems that the obsessive focus on paper counts and citations has encouraged the rise of intellectually dishonest strategies for maximizing one's academic footprint. The huge research scientist salaries at top industry labs, which similarly obsess over paper counts and citations in their hiring process, only amplifies the incentive for paper riding. The reason I think it is bad: As more people paper ride, co-authorship on a paper gradually becomes a worse indication of expertise. Not to mention, paper riders are intellectually dishonest, by claiming credit for research that they didn't significantly contribute to. In a sense, it seems like a roundabout form of plagiarism. I know some might disagree with this take, as some people believe in being as generous about co-authorship as possible. I find that mindset to create the perfect environment for paper riders to flourish. I'm wondering if you've also seen paper riding happen and whether you think this behavior is good or bad. submitted by /u/alwayshumming [link] [comments]  ( 7 min )
    [D][R] Generate random sample for exponentiated Weibull distribution
    Hi there experts, I have a real distribution for which I had run this scipy script to detect the best fit: However, the script outputs 4 parameter values and the best fit is actually a Exponentiated Weibull distribution. Now I am clueless how to generate a sample list of data of n-size. I know for sure about the normal distribution after getting these params as mean and sigma. How to I generate such list. Please help. ​ ​ https://preview.redd.it/79n28icmsqr81.png?width=1141&format=png&auto=webp&s=d9478691c06f5cdfe03af4f82db8293443e91f1e submitted by /u/GoldenDew9 [link] [comments]  ( 1 min )
    [R][D] VAE Embedding Space - Can we force it to learn a metric?
    I understand that certain AE types such as B-VAE disentangle certain aspects of variation in the data, and those such as Conditional AE or VAE allow us to separate these aspects with labels. However, what I have seen is that the embedding space doesn't cluster the images as well as some contrastive methods. However contrastive methods require non-elegant negative sampling etc. Can we somehow force the VAE to learn both the variational lower bound as well as learn a good metric between samples such as visually similar samples are better clustered together? submitted by /u/jim_from_truckistan [link] [comments]  ( 1 min )
    [D] Jetson AGX Orin dev kit as a stand-alone training platform
    The Jetson Orin 64gb model has "275 Sparse|138 Dense INT8 TOPS", and I am a little confused about how to compare this to something like the RTX a6000's performance. I am looking to do deep rl training and am new to the field. What metrics make a difference for deep rl? Any thoughts on the Orin dev kit's ability to train deep rl? submitted by /u/here_to_create [link] [comments]  ( 1 min )
    [D] With the rise of AutoML, what are the important skills for a ML career?
    Some time down the road, when AutoML becomes more established, it can help us determine the best ML model and hyperparameters for a particular problem. This will not replace data scientist, as we still need data scientists for their domain knowledge, which is critical for scoping business problems, pre-processing data, and deriving business insights from the trained model. However, since data scientists no longer need to deal with the technicalities of a model in the near future (i.e. they no longer have to tune hyperparameters, determine the best opitmistion function etc), is there still a need for aspiring data scientists to learn about the intricacies and nuances behind the various models (maybe by coding the model from scratch)? Or is it enough for them to learn how to operate an AutoML system? (My question is referring to the corporate world in general and not to academia) Thanks in advance for your answers :) submitted by /u/smart_oinker [link] [comments]  ( 7 min )
    [P] AutoML-Conf Competition: DAC4AutoML
    Hi everyone! We've just launched a competition at the AutoML-Conf 2022, the DAC4AutoML competition. It has two tracks, one for configuring a Computer Vision model and one for a RL pipeline: https://automl.github.io/dac4automlcomp/ And what is DAC exactly? It means we want to find well-performing hyperparameter configurations like in Algorithm Configuration, but we do it dynamically - thus DAC, Dynamic Algorithm Configuration. As to how that is supposed to happen? We don't put any restrictions on the solutions for the competitions, so you can submit your hand-tuned static hyperparameter setting if you want. Or you can use some sort of heuristic, a regression model, reinforcement learning, ... whatever works. If you're interested in participating, you can submit from now on until the 18.06. AOE, the winners will be announced at the AutoML-Conf. submitted by /u/catsortion [link] [comments]  ( 1 min )
    [D] Imagenet Original Pictures
    As I understood it Imagenet got generated from internet images, but I am unable to to find the originals using naive image search. Is there any mapping? I wonder if imagenet data is a cropped versions of original pictures or not, i don't see it in the paper. submitted by /u/LeanderKu [link] [comments]  ( 1 min )
    [P] UFO Lands on Highway! Or Depth Estimation using ML
    Article describing depth estimation using machine learning models and 3D visualization of depth maps using three.js. https://www.storminthecastle.com/posts/ufos_and_depth/ submitted by /u/CakeStandard3577 [link] [comments]
    [D] Could Stylegan-XL be great for out-of-domain generation?
    In the context of text-to-image generation, I'd say one of the reasons VQGAN is so used in popular notebooks is that it can deal with many concepts, while stylegan used to be limited to the domain it was trained for. That may be about to change with the rollout release of Stylegan-XL weights trained on Imagenet. This notebook (https://github.com/CasualGANPapers/StyleGANXL-CLIP) has had nice results with objects never seen by the model, such as "apple" and "ant", as well as scenes such as "judo athletes fighting" Please note that the Stylegan-XL weights are currently available for 128x128 pixels. ETA for the 256 resolution is 14.04.22 submitted by /u/HrodRuck [link] [comments]  ( 1 min )
    [Discussion] Support Vector Machines... in 2022
    My post is inspired by this discussion. In that thread, OP asked why support vector machines are still taught. People offered several thoughts: they're easier to think about, they're still perfectly good for some real-world problems, and for some problems they apparently rival deep networks. I did a project for a class around six years ago using an SVM as implemented in scikit-learn. I was pretty satisfied with the project, but I also experienced some frustrations, and came away with some questions. I started working with Tensorflow and DNNs in earnest soon after finishing that project, and I largely stopped thinking about SVM. I would like to revive the questions I asked, but never answered, here. A DNN with multiple outputs can potentially use a single neuron in the prediction of more than one output. For multiple, mutually-exclusive categories, this makes good sense. An SVM with multiple outputs in scikit-learn was implemented as pairs of one-vs-one SVMs, each of which was independently fit to data. This gets inefficient quickly. Has this changed? Can it be changed? DNN training at scale is a problem that many people have worked hard to make practical. Even non-experts like myself use our home GPUs to accelerate training of DNNs on large data sets. In scikit-learn, SVM training was implemented in a single thread on one CPU core. If you are performing cross-validation or a hyperparameter optimization study, it might be practical to parallelize fitting; one thread for each distinct condition. But can you parallelize the SVM fitting algorithm for a single condition? I went looking for software, but I couldn't find anything. Over to you folks. Cheers. submitted by /u/aotus_trivirgatus [link] [comments]  ( 6 min )
    [R] Restormer: Efficient Transformer for High-Resolution Image Restoration (CVPR 2022--ORAL) + Colab Demo + Gradio Web Demo
    ​ Visual Results With Restormer, you can remove noise, motion blur, defocus blur, and rain streaks from your own images. Paper: https://arxiv.org/abs/2111.09881 Github: https://github.com/swz30/Restormer Colab Demo: https://colab.research.google.com/drive/1C2818h7KnjNv4R1sabe14_AYL7lWhmu6?usp=sharing Gradio Web Demo: https://huggingface.co/spaces/swzamir/Restormer submitted by /u/swz30 [link] [comments]  ( 1 min )
    [R] [D] Seq2seq model hyperparameters tuning
    Does anyone have any advices or research papers on what hyperparameters do researchers use for their seq2seq model? I am interested in knowing whether hyperparameters such as dropout, or recurrent dropout, batchnorm, etc etc, are even necessary in the usage of seq2seq model, but couldn’t find anything on it for weeks. In the case, let’s say, using gridsearchCV, what hyperparameters do you tweak for ur seq2seq model? (Other than the usual stuff like number of neurons, etc). There is absolutely zero information for that on seq2seq model, and everyone just assumes that putting an attention mechanism solves everything without hyperparameters tunings. I have also looked up on codes on seq2seq, and no hyperparameters tunings were shown whatsoever. FYI, this is in the context of time series data, using seq2seq, if that matters. Thanks submitted by /u/plsendfast [link] [comments]  ( 1 min )
    [D] Has anyone seen any papers related to GANs which prove that the optimum remains unchanged when adding supervised loss (e.g. L1, L2)?
    I’ve been reading many papers lately pertaining to GANs, with more and more introducing supervised loss into the generator’s objective function. However, no one ever seems to show that the optimum remains undisturbed. Results seem to be strictly empirical most of the time. Has anyone seen any papers where it is shown that the disruption to the generator’s loss doesn’t harm convergence? submitted by /u/king_of_walrus [link] [comments]  ( 1 min )
    [D] What is your experience with Fake results or overfitted results being sold as awesome?
    I am curious what is everyones experience with completely faked, falsified, or fabricated results in the area? Another aspect of this I think is people taking heavily overfitted results and finding one decent example that is from the test set and claiming their method is awesome. How much of this have you seen and how much of the research out there fits into this category? submitted by /u/LifeguardDismal142 [link] [comments]  ( 1 min )
  • Open

    Win tickets to The AI Summit London 2022
    Sponsored Post Join the UK’s most forward-thinking technologists and business professionals this June in a celebration of emerging technology. Machine […] The post Win tickets to The AI Summit London 2022 appeared first on Machine Learning Mastery.  ( 2 min )
  • Open

    MIT has trained AI to generate new molecular materials
    submitted by /u/aidev2040 [link] [comments]
  • Open

    Building a Data Products-centric Business Model
    When I was the Vice President of Advertiser Analytics at Yahoo!, I painfully learned that my targeted user personas (Media Planners & Buyers and Campaign Managers) didn’t want more data in helping them optimize their marketing, campaign, and advertising spend across the Yahoo! Ad Network.  Heck, they didn’t even want analytics!  The aspirations for these… Read More »Building a Data Products-centric Business Model The post Building a Data Products-centric Business Model appeared first on Data Science Central.  ( 5 min )
    Data Discovery for ML Engineers
    Real-world production ML systems consist of two main components: data and code. Data is clearly the leader, and rapidly taking center stage. Data defines the quality of almost any ML-based product, more so than code or any other aspect. In Feature Store as a Foundation for Machine Learning, we have discussed how feature stores are… Read More »Data Discovery for ML Engineers The post Data Discovery for ML Engineers appeared first on Data Science Central.  ( 9 min )
    Content Metrics That Can Help You To Write Dissertations
    Many things can help you to write good dissertations. One of the most important is to use content metrics. It is necessary for all of the students to understand content metrics in detail. A clear understanding of its types and measuring strategies help you to evaluate things in a precise way. Whatever is your topic… Read More »Content Metrics That Can Help You To Write Dissertations The post Content Metrics That Can Help You To Write Dissertations appeared first on Data Science Central.  ( 5 min )
    Comparative analysis of an Intel and AMD Processor
    The need of a highly functional and fast processing Central Processing Unit (CPU) in today’s world is not just mostly desired, but also mostly required due to the rapid digitalization across the globe. Whether you work on a personal computer (PC) unit or laptop, the necessity of a highly advanced processor is indispensable.  This is… Read More »Comparative analysis of an Intel and AMD Processor The post Comparative analysis of an Intel and AMD Processor appeared first on Data Science Central.  ( 3 min )
    AI And Its Impact On Diversity And Inclusion
    How does artificial intelligence Diversity, Equity, and Inclusion (DEI) fit into the technological stack of daily companies? Fostering a diverse workforce is a very human problem. The cry for a halt to race prejudice has become deafening, and it’s increasingly a decisive factor for talent when weighing job offers and purchases. To stay up with the… Read More »AI And Its Impact On Diversity And Inclusion The post AI And Its Impact On Diversity And Inclusion appeared first on Data Science Central.  ( 4 min )
    How Data Intelligence Platforms Promote Business Success
    Understanding consumer behavior is becoming more and more critical as businesses seek to find innovative ways to survive and thrive in a period of constant change. In the last few years, the market has seen significant changes in the way people shop, travel, dine and purchase goods. As a business, when it comes to understanding… Read More »How Data Intelligence Platforms Promote Business Success The post How Data Intelligence Platforms Promote Business Success appeared first on Data Science Central.  ( 4 min )
    Exploring AI labeling for children’s products
    I read an article from the world economic forum which proposed an AI labeling system for AI products designed for children Today, for the first time, children are growing up in a world shaped by artificial intelligence (AI) and decisions are being made for children implicitly by AI.  Algorithms need data that is collected and… Read More »Exploring AI labeling for children’s products The post Exploring AI labeling for children’s products appeared first on Data Science Central.  ( 3 min )
  • Open

    Need project suggestions
    I’ve been running circles in tutorial purgatory and I want to get out of it with sone projects. Anyone has any suggestions? Guided ones would be nice. For unguided ones, could you please provide source links/hints? submitted by /u/HellVollhart [link] [comments]  ( 1 min )
    Agents learns policy when sampling last episode from replay buffer, but don't when randomly sampling from the replay buffer
    Hi all. I've been stuck on this problem for a while and I thought I might be able to find some help here. Any kind of assistance would be greatly appreciated. My setup is as follows. I have an environment with 3 agents. All 3 agents have a single policy network, and it is based on CommNet. My goal is to implement a replay buffer for this environment. I verified that my replay buffer logic is good. I tried running 3 different types of runs: Normal on-policy run: The agents perform an episode, and at the end of each episode the data (such as the states, actions, etc) from this episode are used to calculate the loss Using just the last episode from the replay buffer: The agents perform an episode, and the data is stored in the replay buffer. At the end of each episode, the last episode is sampled from the replay buffer (which is the episode that was just performed). This is just to confirm that my replay buffer is working properly, and the reward curve for this case matches that from (1). Using 1 random episode from the replay buffer: The agents perform an episode, and the data is stored in the replay buffer. At the end of each episode, a random episode is sampled from the replay buffer and used to calculate the loss. The performance is terrible in this case, and the environment times out each time For some reason, as soon as I turn on random sampling, progress is really bad. I'm sorry to pose such an open-ended question, but what are some things I could check to pinpoint the source of this problem? What could be a reason as to why performance is as expected when just sampling the last episode, whereas it is terrible when randomly sampling episodes? I've tried some things thus far but nothing has worked, and I turned to this community in hopes of getting some help. I'm new to the area of reinforcement learning, so I would be very grateful for any kind of help you can offer. Thanks in advance submitted by /u/lebr0n99 [link] [comments]  ( 3 min )
    PPO sample correlation?
    Hi, I'm wondering if the PPO algorithm can solve the sample correlation problem of on-policy algorithm in training. PPO uses successive samples to compute GAE, doesn't the sample correlation occurring here interfere with learning? submitted by /u/noisemastar [link] [comments]
    [P] AutoML-Conf Competition: DAC4AutoML
    submitted by /u/catsortion [link] [comments]  ( 1 min )
    Temporal Difference Learning for Model Predictive Control
    submitted by /u/bendee983 [link] [comments]
    Why is there no rollout monitoring for this CustomEnv (on the right) ?
    ​ Output from using model.learn(env) on both Envs On the left I have a simple dummy CustomEnv (Using Stable-Baselines3 with Gym) for testing, and on the right I have my actual CustomEnv that I am working on in a project. As you can see, the dummy environment gives me the rollout monitoring, whereas there is no rollout monitoring for the actual environment (just time + train statistics/monitoring). I am using very similar code when setting up the training of the model, however the complexity of the actual model is significantly higher than the dummy. In theory, the complexity of the environment shouldnt make a big difference to the monitoring right? All of the key parts are still there (reward function, step function, reset function etc.). In both cases it says that the environments are being wrapped by the 'Moniter' wrapper so that cant be it. Does anyone know why this might be happening? submitted by /u/C_BearHill [link] [comments]  ( 1 min )
    Value Iteration in Car Racing V1
    I’m working on Q table learning model for OpenAI’s. I have everything done in regards to a basic agent, but I’m unsure how I’m supposed to use the box data for action space and observance space, to populate a q table? Or is this approach incorrect? Car Racing doesn’t have a P (probability) call so I’m not sure how else I would do value iteration. submitted by /u/Dzartovian94 [link] [comments]  ( 1 min )
    Action Spaces all landing to zero probability in few steps
    Hey guys, I am new to RL and walking through the Keras implementation of Actor Critic. ​ As a variant of it, I am trying to learn the strategy for WORDLE. However, after a few runs, my action spaces all go down to zero. Not sure what's happening. Could someone have any insights or pointers? ​ Attaching my code for reference. ​ Thanks import pandas as pd import numpy as np import random import string import random import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers # Configuration parameters for the whole setup gamma = 0.9 # Discount factor for past rewards max_runs = 10000 eps = np.finfo(np.float32).eps.item() # Smallest number such that 1.0 + eps != 1.0 my_file = open("", "r") content = my_file.read() content = lis…  ( 3 min )
    Any RL-related conferences right after NeurIPS 22’?
    In case my NeurIPS submission rejected, lol. submitted by /u/Blasphemer666 [link] [comments]  ( 1 min )
  • Open

    Robots dress humans without the full picture
    MIT researchers design a robot that has a trick or two up its sleeve.  ( 6 min )
  • Open

    Reproducibility in Deep Learning and Smooth Activations
    Posted by Gil Shamir and Dong Lin, Research Software Engineers, Google Research Ever queried a recommender system and found that the same search only a few moments later or on a different device yields very different results? This is not uncommon and can be frustrating if a person is looking for something specific. As a designer of such a system, it is also not uncommon for the metrics measured to change from design and testing to deployment, bringing into question the utility of the experimental testing phase. Some level of such irreproducibility can be expected as the world changes and new models are deployed. However, this also happens regularly as requests hit duplicates of the same model or models are being refreshed. Lack of replicability, where researchers are unable to reproduce…  ( 9 min )
  • Open

    Customize the Amazon SageMaker XGBoost algorithm container
    The built-in Amazon SageMaker XGBoost algorithm provides a managed container to run the popular XGBoost machine learning (ML) framework, with added convenience of supporting advanced training or inference features like distributed training, dataset sharding for large-scale datasets, A/B model testing, or multi-model inference endpoints. You can also extend this powerful algorithm to accommodate different requirements. […]  ( 5 min )
    Detect adversarial inputs using Amazon SageMaker Model Monitor and Amazon SageMaker Debugger
    Research over the past few years has shown that machine learning (ML) models are vulnerable to adversarial inputs, where an adversary can craft inputs to strategically alter the model’s output (in image classification, speech recognition, or fraud detection). For example, imagine you have deployed a model that identifies your employees based on images of their […]  ( 12 min )
  • Open

    Unreal Engine and NVIDIA: From One Generation to the Next
    Square/Enix presents the fictional city of Midgar in Final Fantasy VII Remake at a filmic level of detail. Epic’s Fortnite bathes its environments in ray-traced sunlight, simulating how light bounces in the real world. And artists at Lucasfilm revolutionized virtual production techniques in The Mandalorian, using synchronized NVIDIA RTX GPUs to drive pixels on LED Read article > The post Unreal Engine and NVIDIA: From One Generation to the Next appeared first on NVIDIA Blog.  ( 4 min )
    Green Teams Achieve the Dream: NVIDIA Announces NPN Americas Partners of the Year
    A dozen companies today received NVIDIA’s highest award for partners, recognizing their impact on AI education and adoption across such industries as education, federal, healthcare and technology. The winners of the 2021 NPN Americas Partner of the Year Awards have created a profound impact on AI by helping customers meet the demands of recommender systems, Read article > The post Green Teams Achieve the Dream: NVIDIA Announces NPN Americas Partners of the Year appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    Bounding zeros of an analytic function
    The previous post looked at the problem of finding the zeros of a cubic polynomial. Assuming we’re going to use a numerical method to calculate the zero, the hard part is knowing where to tell the numerical method to look. That post showed how to use a change of variables to guarantee that the polynomial […] Bounding zeros of an analytic function first appeared on John D. Cook.  ( 2 min )
    Numerically finding roots of a cubic
    The analog of the quadratic formula for cubic equations is cumbersome. A lot of people naturally say “Forget all that. If I need to find the roots of a cubic, I’ll just use a numerical method like Newton’s method.” Sounds good. Where to start? But how do you know where to look for the roots? […] Numerically finding roots of a cubic first appeared on John D. Cook.  ( 3 min )
  • Open

    Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data
    Highlights Extend the VAE framework to work with heterogeneous, i.e. multi-modal, data by projecting all “channels” to a common latent representation; Use variational dropout to learn sparse representation, which are useful to discover in an unsupervised manner the optimal number of dimensions, i.e. the number of ground truth generative factors.  ( 2 min )

  • Open

    Mathematics and piano tuning
    The following is a slightly edited version of a Twitter thread on @AlgebraFact. The lowest C on a piano is called C1 in scientific pitch notation. The C one octave up is C2 and so forth. Middle C is C4. The frequency of Cn is approximately 2n+4 Hz. This would be exact if C0 were […] Mathematics and piano tuning first appeared on John D. Cook.  ( 2 min )
    Computing functions of roots without computing roots
    Once in a while it’s necessary to calculate some function of the roots of a polynomial, and it may be possible to do this without first calculating the roots. Quadratics The quadratic formula gives explicit solutions to the equation The two solutions for x are where The awkward part is taking the square root of […] Computing functions of roots without computing roots first appeared on John D. Cook.  ( 3 min )
    FWHM for a quadratic
    This post contains a derives a result I needed recently. The derivation is simple but a little tedious, so I wanted to save it in case I need it again. Full width half maximum A common way to measure the width of a function peak in a function f(x) is to find the place x0 […] FWHM for a quadratic first appeared on John D. Cook.  ( 2 min )
    Number slang and numbered lists
    Here’s a list of five numbers used as slang in various contexts. Location (CB and police radio) End of column (journalism) Best wishes (ham radio) All aircraft in area (US Navy) I love you (text messages) The motivation for this post was an article Those HTML attributes you never use. I wanted to make a […] Number slang and numbered lists first appeared on John D. Cook.  ( 1 min )
  • Open

    PPO Alg confusion
    As I read the paper and several tutorials, I am quite confused about the details. I see many implementations scale the running cumulative discounted reward in a certain way. However, each of them does it in a different way. let R be the running cumulative discounted reward, which is considered the best? Or is there a source of the method to use? implementations I saw from different places include: directly use R to calculate the advantage and training value network (most PPO tutorials use this) use (R / std(R)), where std is the mini-batch standard deviation use (R / std(R)), where std is the running standard deviation use ((R - mean(R)) / std(R)), where both mean and std are mini-batch wise use ((R - mean(R)) / std(R)), where both mean and std are running stats. do the above and clip to a certain range ([-10, 10] or [-1, 1]) I also see several different ways for the value network, let V be the output of the value network: output raw logit, without any scaling/output activation (most PPO tutorials use this) output raw logit, but use the same scaling as discussed above for running cumulative discounted reward, for example, if return value is (R / std(R)), value output will be (V / std(R)) do the same as 2, but use the stat of V instead of R for scaling, for example, if the return value is (R / std(R)), the value output will be (V / std(V)) output with tanh activation at the last layer output with tanh activation at the last layer, and multiply by a constant to match the range of the return any help would be appreciated, thanks! submitted by /u/seermer [link] [comments]  ( 2 min )
    I’m completely new to RL and will be building my first model as part of my degree-ending project. Do you have any tips you can provide?
    Hello all, As the title describes, I’ll be making my first model as part of my final project. I still have a pretty high-level understanding of everything, so forgive any inaccuracies as I describe what I’m going for. The problem I’m attempting to solve is known as the traveling salesman problem. Essentially, a route needs to be formulated that stops at n given locations. Finding the most efficient route with many stops algorithmically is impractical because the number of possible routes increases exponentially with each added location. The environment will simulate travel on city roads. Speed will be a constant, set to whatever the roads speed limit is. I am using .pbf format vector GIS data from OSM so that the environment consists of real-world pathways. I’m using GeoPandas and Pyrosm to work with the data, and I’m collecting nodes for the location of gas stations so that the environment can simulate needing to fuel the vehicle. Gas price will be a constant, as well as vehicle fuel-efficiency. Scoring will be based on the calculated time it would take to complete a route and the calculated cost (in gas). The goal will be to find the most efficient route to take when n = some large number. I’ve never worked with spatial data either, so I’m not sure what kind of challenges that poses. I worry that adding nodes for the locations of gas stations might be difficult. I’m also wondering if I’m better off using Tensorflow and Keras for this, but I’m not really aware of all the technical considerations I should be making before deciding on that. Do you have any tips that might help me out? Solutions to problems I haven’t hit just yet, but likely will? Thanks for your help! submitted by /u/professorDissociate [link] [comments]  ( 2 min )
    Best implementations for extensibility?
    As far as I am aware, StableBaselines3 is the gold standard for reliable implementations of most popular / SOTA deep RL methods. However working with them in the past, I don't find them to be the most usable when looking for extensibility (making changes to the provided implementations) due to how the code base is structured in the behind the scenes (inheritance, lots of helper methods & utilities, etc.). For example, if I wish to change some portion of a method's training update with SB3 it would probably involve overloading a class method before initialization, making sure al the untouched portions of the original method are carried over, etc. Could anyone point me in the direction of any implementations that are more workable from the perspective of extensibility? Ideally implementations that are largely self contained to a single class / file, aren't heavily abstracted aware across multiple interfaces, don't rely heavily on utility functions, etc. submitted by /u/Farconion [link] [comments]  ( 1 min )
    Is it possible to use inspect.getcallargs to convert *args and **kwargs to a canonical kwarg representation in RL?
    Given a NN class, is there something specific we need to care of when converting *args and **kwargs to a canonical kwarg representation? I ask this because in this code from Google (https://github.com/google-research/google-research/blob/c56b47713b08c95ad427d5f93ee0dbb9ad008964/social_rl/multiagent_tfagents/joint_attention/attention_networks.py#L557) they use a TFDecorator-aware replacement for inspect.getcallargs, instead of using getcallargs directly. So my questions are: - Is it possible to use inspect.getcallargs to convert *args and **kwargs to a canonical kwarg representation? - If no, is there an equivalent in PyTorch? I couldn't find any, so I was wondering how people go about that. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    openAI gym return done==True but not seeing goal is reached
    Hi all, I am running some starter code from openAI(FetchReach-v1, FetchPush-v1) gym with env.action_space.sample(). But I don't see the goal is actually achieved when done returned is True. I copied the code from here (https://openai.com/blog/ingredients-for-robotics-research/). I even let it sleep every step to watch more closely. Another related thing that I can't explain is that it always returns done==True rather quickly with very few sampled actions. These all make me worried about using it as my task environment. submitted by /u/AnimatorRemarkable20 [link] [comments]  ( 1 min )
    Which elective: Monte Carlo Simulation or Computational Learning Theory?
    Hello /r/reinforcementlearning. I have to choose electives pretty soon, and as i am interested in reinforcement learning, I wanted to know which of these would be the most beneficial. Monte Carlo Simulation Computational Learning Theory The year after I will also take a course on Reinforcement Learning, but it has not been created yet. Note: I can also take both if recommended, if I do so, I will take one of the courses before taking the RL course, and the other would be at the same time. Some further thought I've had: CLT includes Bandits, which is surely were useful to know, but it seems to be only a rather small part, and I'm unsure whether all the other topics like PAC Learning and Rademacher Bounds are useful? MC is more practical while CLT is more theoretic (Apparently VERY theoretic according to the course description above). I am not afraid of theoretic courses, but I struggle more with them than more practical courses. The sentiment around the MC course, is that it is pretty good. I don't know anyone who have taken the CLT course. If I choose both, which order would you take them in? submitted by /u/John_Hitler [link] [comments]  ( 2 min )
    Ray RL lib observations normalized?
    Hey i am using the RL lib from ray and i don't know if the observations automatically normalized by the lib or not? By creating a costum environment ray wants you to create an observationspace. That would be a gym box in my case. Anyway idk the exact high and low values. My values lay between -1 and 1 more or less. My fear is now that ray would normalize the Observation values to a new range although they are already processed. Does ray normalized observationspace? If yes how can i turn it off? Thanks! submitted by /u/Willing-Classroom735 [link] [comments]  ( 1 min )
    [CfP] EvoRL @ GECCO 2022. One week before the deadline!
    CALL FOR PAPERS EvoRL 2022 Evolutionary Reinforcement Learning workshop at GECCO 2022, July 9-13, Boston, USA In recent years reinforcement learning (RL) has received a lot of attention thanks to its performance and ability to address complex tasks. At the same time, multiple recent papers, notably work from OpenAI, have shown that evolution strategies (ES) can be competitive with standard RL algorithms on some problems while being simpler and more scalable. Similar results were obtained by researchers from Uber, this time using a gradient-free genetic algorithm (GA) to train deep neural networks on complex control tasks. Moreover, recent research in the field of evolutionary algorithms (EA) has led to the development of algorithms like Novelty Search and Quality Diversity, capable of…  ( 2 min )
    How PPO deals with episodes of Variable lengths?
    In the paper it is written to collect trajectories of length T. Then calculate advantage and then train the Actor and Critic Network. My question is suppose one episode ends much before T. If I run that episode upto lenth T then it will only collect negative rewards in each timestep which in turn makes the training impossible as the return if very big negative number. So what can be done instead of this? I might be getting it wrong, so please correct me by commenting. submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Need help with OpenAI gym custom environment, state representation as "observation"
    Hello, I'm making a custom openAI gym environment to train various algorithms on it. I have encountered some issues. My .flatten() method on the state class returns a large integer which can be converted back into the same state as the object. However when I try to do this as the returned observation for environment.reset() and environment.step(), when testing it I get: "AssertionError: The observation returned by the `reset()` method does not match the given observation space" which I can fix by having it just return a 0. How do I go about resolving this? and are there any better approaches for wanting to train RL agents on an environment? ty! submitted by /u/snaredrum_merchant [link] [comments]  ( 1 min )
  • Open

    [D] Why do we still teach support vector machines?
    Honest question: are there any applications for which SVMs are the best choice? In my experience, no one seems to use this methodology anymore, though maybe I'm wrong. It just kinda feels like teaching people how to use a slide rule when everyone has calculators. submitted by /u/WartimeHotTot [link] [comments]  ( 3 min )
    [D] Paper Explained - Continual Backprop: Stochastic Gradient Descent with Persistent Randomness
    https://youtu.be/zEMOX3Di2Tc This paper finds what seems to be a new phenomenon when working in the continual learning/life-long learning domain. If new tasks are continually introduced to an agent, it seems to loose it's ability to learn the more time progresses. Intuitively it's similar to this idea that "an old dog can't learn new tricks". They propose a fairly simple method of overcoming this limitation that involves resetting weights that are not contributing much to the outcome of the network. They call the method Continual Backprop. ​ Outline: 0:00 - Overview 2:00 - Paper Intro 2:53 - Problems & Environments 8:11 - Plasticity Decay Experiments 11:45 - Continual Backprop Explained 15:54 - Continual Backprop Experiments 22:00 - Extra Interesting Experiments 25:34 - Summary ​ Paper link: https://arxiv.org/abs/2108.06325 submitted by /u/SlickBlueML [link] [comments]  ( 1 min )
    [R] Google's 540B (Dense) model Pathways LLM, "Unlocks" new tasks proportional to scale
    Blog: https://ai.googleblog.com/2022/04/pathways-language-model-palm-scaling-to.html Paper: https://goo.gle/palm-paper - AFAIK from the Blogpost, Scaling laws still hold up (i.e not yet plateaued) - New transfer learning capabilities, outperforms fine-tuned models with 50x less data (Codex-12B) - The interesting part is how it meta-learns techy geeky jokes and is able to correlate concepts, and explain jokes suggesting starting doing a bit more meta-learning than GPT3 ever could.... But still not enough to generate decent ones (though the joke wasn't particularly humorous, so I may be underestimating) SoTA on various tasks, chain-of-thought-reasoning still holds up to scaling and outperforms some reasoning benchmarks, BIG-bench sees a huge improvement and general LLM thingys :) submitted by /u/Competitive-Rub-1958 [link] [comments]  ( 4 min )
    [R] Minimum Description Length Recurrent Neural Networks
    https://arxiv.org/abs/2111.00600 https://preview.redd.it/l6dni0007jr81.png?width=4888&format=png&auto=webp&s=82c7c9b9433b79c66318090ff85e4535c35ddb18 submitted by /u/inland-1 [link] [comments]  ( 1 min )
    [R] CfP EvoRL @ GECCO 2022. One week before the deadline!
    CALL FOR PAPERS EvoRL 2022 Evolutionary Reinforcement Learning workshop at GECCO 2022, July 9-13, Boston, USA In recent years reinforcement learning (RL) has received a lot of attention thanks to its performance and ability to address complex tasks. At the same time, multiple recent papers, notably work from OpenAI, have shown that evolution strategies (ES) can be competitive with standard RL algorithms on some problems while being simpler and more scalable. Similar results were obtained by researchers from Uber, this time using a gradient-free genetic algorithm (GA) to train deep neural networks on complex control tasks. Moreover, recent research in the field of evolutionary algorithms (EA) has led to the development of algorithms like Novelty Search and Quality Diversity, capable of…  ( 2 min )
    [P] Looking for a dataset
    Hey! New Here. I logged back into Reddit after years just to ask this question on this forum. I need to test a model, based loosely on BERT, that classifies a piece of text as having right or left political ideology leaning and whether it promotes any racial or religious stereotypes. For training purpose we used SBIC, IBC, and Stereoset. Though these only contain short sentences which are labeled as belonging to only one of the above categories. Is anyone aware of any other Dataset which can be used for this purpose, which hopefully contains text labeled as promoting or containing a political leaning (left/right, conservative/liberal, neutral) and further either any racial or religious stereotypes? Very thankful in adv submitted by /u/Fee_Imaginary [link] [comments]  ( 1 min )
    [P] Random Relational Graph Convolutional Networks (RR-GCN)
    📑 The Random R-GCN code has just been released! 📝 With just a few lines of code, you can now create embeddings of entities in a Knowledge Graph. ​ Minimal example on how to create embeddings with RR-GCN ​ 💡 RR-GCN does not require training and is competitive to fully trained R-GCNs. 👉 https://github.com/predict-idlab/RR-GCN submitted by /u/givdwiel [link] [comments]
    [R] DiffusionCLIP: Text-Guided Diffusion Models for "Robust" Image Manipulation (CVPR 2022)
    submitted by /u/ImBradleyKim [link] [comments]  ( 1 min )
    [P] Transformers for Software Engineers
    submitted by /u/hardmaru [link] [comments]
  • Open

    Logging in Python
    Logging is a way to store information about your script and track events that occur. When writing any complex script […] The post Logging in Python appeared first on Machine Learning Mastery.  ( 22 min )
  • Open

    Build an MLOps sentiment analysis pipeline using Amazon SageMaker Ground Truth and Databricks MLflow
    As more organizations move to machine learning (ML) to drive deeper insights, two key stumbling blocks they run into are labeling and lifecycle management. Labeling is the identification of data and adding labels to provide context so an ML model can learn from it. Labels might indicate a phrase in an audio file, a car […]  ( 7 min )
    Enable Amazon Kendra search for a scanned or image-based text document
    Amazon Kendra is an intelligent search service powered by machine learning (ML). Amazon Kendra reimagines search for your websites and applications so your employees and customers can easily find the content they’re looking for, even when it’s scattered across multiple locations and content repositories within your organization. Amazon Kendra supports a variety of document formats, […]  ( 5 min )
    Interpret caller input using grammar slot types in Amazon Lex
    Customer service calls require customer agents to have the customer’s account information to process the caller’s request. For example, to provide a status on an insurance claim, the support agent needs policy holder information such as the policy ID and claim number. Such information is often collected in the interactive voice response (IVR) flow at […]  ( 6 min )
  • Open

    School of Engineering welcomes Thomas Tull as visiting innovation scholar
    Primary focus will be to advance and promote technology, innovation, and entrepreneurship across the school.  ( 4 min )
  • Open

    Microsoft Researchers Introduce ‘Jigsaw’: An AI Tool To Augment Large Language Models (GPT-3, Codex, etc.) By Deploying Post-Processing Techniques That Understand The Programs’ Syntax And Semantics
    GPT-3, Codex, and other sizable pre-trained language models can be adjusted to create code from natural language descriptions of programmer intent. Every developer in the world might benefit from these automated models, which have the potential to increase productivity. However, because the models may fail to understand program semantics, the quality of the generated code cannot be guaranteed. Microsoft researchers introduce Jigsaw, a new tool that can help these big language models perform better. Jigsaw is a Python Pandas API code generator that accepts multi-modal inputs. Jigsaw uses post-processing techniques to decipher the syntax and semantics of programs and then uses user feedback to improve future performance. Continue Reading Paper: https://arxiv.org/pdf/2112.02969.pdf Dataset: https://github.com/microsoft/JigsawDataset ​ https://i.redd.it/x223r5qu0kr81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
    submitted by /u/nick7566 [link] [comments]
    Generative AI+Alex Grey = xxxxxoooooooo (Disco Diffusion)
    submitted by /u/JoshGrambo [link] [comments]
    UiPath extract Tables from PDF (use case) (PDF table)
    submitted by /u/Cristi_UiPath [link] [comments]
    New RL technique achieves superior performance in control tasks
    submitted by /u/bendee983 [link] [comments]
    Metrics: Matthew's correlation coefficient
    submitted by /u/TheTesseractAcademy [link] [comments]
    12 Graphs That Explain the State of AI in 2022
    submitted by /u/Tao_Dragon [link] [comments]
    What is WhatsApp Business API? How can it Help your Business?
    submitted by /u/mihircontra20 [link] [comments]
    Voice copying/cloning
    Hi all, Don't know if this is the right subreddit, but here goes.... I'm looking to voice clone my father. He has passed recently, and despite being difficult for all, it's been especially hard for my mother, married early to him and together for 50 years. Her birthday is coming up, I'd love to be able to create a 5-10 second sound byte of him for her. Fortunately, there's likely to be lots of his voice recording around, part of his job was speaking and instructing. So, is there any way this is possible, to be done without great difficulty, and produce an accurate result? I am understanding the moralities of crafting something with his deceased voice. I thought about it quite a bit. However, I feel that it's for his soulmate who's struggling, who he had no qualms spending his life with and travelling abroad with, spent his last days with. I'm certain he would want to help. submitted by /u/mininggotboring [link] [comments]  ( 1 min )
  • Open

    Data Augmented Healthcare Part 2
    In the previous part, we discussed the current state of data imaging tools in healthcare and the future applications of these technologies. While increased access to information is invaluable to physicians, they can still be limited by their own ability to interpret, or the physical limitations of their surgical ability. In addition to augmenting the… Read More »Data Augmented Healthcare Part 2 The post Data Augmented Healthcare Part 2 appeared first on Data Science Central.  ( 3 min )
  • Open

    Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance
    Posted by Sharan Narang and Aakanksha Chowdhery, Software Engineers, Google Research In recent years, large neural networks trained for language understanding and generation have achieved impressive results across a wide range of tasks. GPT-3 first showed that large language models (LLMs) can be used for few-shot learning and can achieve impressive results without large-scale task-specific data collection or model parameter updating. More recent LLMs, such as GLaM, LaMDA, Gopher, and Megatron-Turing NLG, achieved state-of-the-art few-shot results on many tasks by scaling model size, using sparsely activated modules, and training on larger datasets from more diverse sources. Yet much work remains in understanding the capabilities that emerge with few-shot learning as we push the limits of …  ( 9 min )
  • Open

    Meet the Omnivore: Videographer Makes Digital Walls, Virtual Homes Pop With NVIDIA Omniverse
    Pekka Varis’s artistry has come a long way from his early days as a self-styled “punk activist” who spray painted during the “old school days of hip hop in Finland.” The post Meet the Omnivore: Videographer Makes Digital Walls, Virtual Homes Pop With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 3 min )
  • Open

    Composing Music with Neural Networks
    Hey guys, ​ I really love creating music algorithmically, which is why I have dedicated my master’s thesis to the generation of music patterns by the use of artificial intelligence. In the course of the past 12 months, I have programmed a deep recurrent neural network in Python, which I have trained on 200 self-made music patterns in order to generate somehow novel motifs. ​ In order to evaluate my model, I have set up a short online listening experiment. I’m looking for test subjects right now, so if you are interested in participating, I would really appreciate it. The listening experiment will take you just about 5 to 8 minutes to complete and the only thing you need is a pair of headphones. You can partake on your computer as well as on your smartphone or tablet. ​ Here is the link which gets you to the listening experiment: https://forms.gle/rx1FUQ7RgpjMu1xx9 ​ Thank you very much for taking the time to help me reach my goal. Really appreciate it. submitted by /u/JosephdeLaquinta [link] [comments]  ( 1 min )

  • Open

    [D] Unconventional computer vision problems that are intrinsically different from classifying ordinary stuff
    Given that most benchmarks for image classification are based on regular, everyday world objects RGB images (or grayscale), what are some unconventional science cases where 2D inputs are substantially different from what we are used to perceive by eye? For example, I'm interested in cases where spatial information can't be constrained to narrow pixel value ranges, such as exponential signals. Or that any standard normalisation (say min-max, zscore) and normalisation layers are not applicable and could lead to the loss of information. One of these cases is Astronomy. However, most practitioners try to to adapt the problem to established standards (say fake RGB images, log scaling flux images, etc). What are other cases out there where the nature of the 2D inputs are very distinct to what we are used to parse through our eyes and what deep nets are benchmarked on? I'm curious about tailored solutions that would intrinsically change the way the deep nets are constructed to solve the research question. submitted by /u/astroferreira [link] [comments]  ( 1 min )
    [Research][Project][Library] Dog-feeding a new Machine Learning data tool
    Hi everyone! I'm Atindriyo Sanyal, one of the founders of the ML company Galileo (https://rungalileo.io/). We're building a cool new tool/framework for ML practitioners that helps shine a light on the data you are training your models with. I'd love to get some feedback on the product, and since we're still in private beta, I'm looking for folks to try out the product on their datasets and models. It's easy to use and hooks into popular frameworks such as pyTorch, Tensorflow, Keras, SpaCy etc. Caveat: Currently the tool only works for NLP use cases (think text classification, NER etc). I'll be giving $100 to folks who are willing to give some time to this and provide feedback on the usability of the product. If you're interested, here's a really tiny form (should take <1 minute to fill) for you to fill out. I'll review the applications and send you an email for a follow up Zoom chat where I'll share the software artifacts with you! https://docs.google.com/forms/d/11V20C_J_SyNaX7QL6DasnTe7f0UiueUyaKdmt3xL1oI/edit Look forward and happy (machine) learning! - Atindriyo P.S. If you have any questions or want to chat personally, send me an email at [atin@rungalileo.io](mailto:atin@rungalileo.io). submitted by /u/atindriyo_galileo [link] [comments]  ( 1 min )
    [D] Pain points when using GPU instance platforms
    Hi everyone, I just launched a GPU compute instance platform (think lambdalabs, fluidstack, aws EC2, vast), and I was wondering what pain points everyone has with existing solutions. I'm not trying to sell anyone anything, but I want to look for feedback that will help me to build a better product. My current thoughts are Ease of getting data into the platform Ease of getting data off of the platform Automation for spinning up and down instances Availability of the type of instance you want Price too high Not enough/too many abstractions TIA and I look forward to some good discussions! submitted by /u/runpod-io [link] [comments]  ( 1 min )
    [R] Efficient-VDVAE: An open-source memory-efficient and stable very deep hierarchical VAE
    Hello everyone :) We have released last week our paper "Efficient-VDVAE: Less is more" with code! We present simple modifications to the Very Deep VAE to make it converge up to 2.6x times faster and save up to 20x times memory load. We also introduce a gradient smoothing technique to improve stability during training. Our model achieves comparable or better negative log-likelihood (NLL) on 7 commonly used datasets. Additionally, we make an argument against existing 5-bit benchmarks. We empirically show as well that 3% of the latent space is enough to encode the data information without any performance loss. Thus, indicating the potential to efficiently leverage the Hierarchical VAE's latent space in downstream tasks. Paper: https://arxiv.org/abs/2203.13751 Code: https://github.com/Rayhane-mamah/Efficient-VDVAE Paperswithcode: https://paperswithcode.com/paper/efficient-vdvae-less-is-more Feedback is very much appreciated! https://preview.redd.it/tjua1xpq3cr81.png?width=878&format=png&auto=webp&s=718bd91fd648acd673ddab1ad5342207e8be09e7 submitted by /u/Louay-AI [link] [comments]  ( 1 min )
    [R] DeepDPM: Deep Clustering With an Unknown Number of Clusters
    Hey everyone :) We've just released the code for our paper (accepted to CVPR2022) DeepDPM is a nonparametric deep-clustering method which unlike most deep clustering methods, does not require knowing the number of clusters, K; rather, it infers it as a part of the overall learning. Using a split/merge framework to change the clusters number adaptively and a novel loss, our proposed method outperforms existing (both classical and deep) nonparametric methods. While the few existing deep nonparametric methods lack scalability, we show ours by being the first such method that reports its performance on ImageNet. ​ Paper: https://arxiv.org/abs/2203.14309 Code: https://github.com/BGU-CS-VIL/DeepDPM/ Below are some examples of clusters our method found in ImageNet. https://preview.redd.it/jw5kvcuzfbr81.jpg?width=737&format=pjpg&auto=webp&s=5b61cdd0efdea7c92aba611171e5dc7f4276c892 submitted by /u/shahaff32 [link] [comments]  ( 4 min )
    [D] Why are confidence regions elliptic?
    Confidence regions are the 2D version of a confidence interval. Almost everywhere in the literature, the shape is elliptic, but no justification is provided. You would think that a confidence region of level γ is defined as the domain of minimum area, covering a mass γ of the underlying probability distribution. That sounds perfectly logical, but it is mentioned nowhere. Based on this definition, the boundary of a confidence region is obtained by solving an optimization problem: it is a problem in calculus of variations -- finding a boundary curve encompassing a domain of minimum area. These problems are usually hard to solve, but in this case, the solution seems trivial: it must be a contour line. And if the underlying distribution is Gaussian, contour lines are obviously ellipses. This would be a solid justification as to why ellipses are so widespread. https://preview.redd.it/42mr1t1je8r81.png?width=1072&format=png&auto=webp&s=2fb9cedbbb15895827ed00edc4912ac39fad0b71 My question here is whether or not my argumentation makes sense, or if there is something faulty in my math. I discuss it in more details in one of my articles, here. If you need clarifications, please reply on Reddit, I will do my best to explain. submitted by /u/MLRecipes [link] [comments]  ( 2 min )
    [D]New Scaling Laws for Large Language Models
    https://www.lesswrong.com/posts/midXmMb2Xg37F2Kgn/new-scaling-laws-for-large-language-models submitted by /u/Singularian2501 [link] [comments]  ( 1 min )
  • Open

    Blog: Let’s manually approximate a simple function with a ReLU neural network
    submitted by /u/rhkibria [link] [comments]
  • Open

    Top Ways in Which AI Impacts Grocery Retail
    Artificial intelligence has been long making waves globally, empowering companies from across the broad spectrum of industries to take their businesses to the next level. So it is no surprise that this technology is making inroads in the grocery retail space, helping grocers deliver personalized and irreproachable experiences across different channels, establishing improved customer loyalty,… Read More »Top Ways in Which AI Impacts Grocery Retail The post Top Ways in Which AI Impacts Grocery Retail appeared first on Data Science Central.  ( 3 min )
    A different take on business intelligence
    Data is useless if it doesn’t shed light. The more light it sheds on the most acute problems businesses face, the better. Within this context, data synergy–data from multiple sources and disciplines that is more valuable than the sum of its parts–is often underappreciated. With data synergy, the light can be in many more places,… Read More »A different take on business intelligence The post A different take on business intelligence appeared first on Data Science Central.  ( 4 min )
    Blockchain Won’t Save The Metaverse
    Blockchain is widely touted as a mechanism for securing digital property. Multiple problems exist for driving metaverse transactions. A new review highlights the challenges, some of which may be insurmountable. Blockchain has been touted as a potential solution to securing users’ digital content and data due to its decentralization, immutability, and transparency. However, there are… Read More »Blockchain Won’t Save The Metaverse The post Blockchain Won’t Save The Metaverse appeared first on Data Science Central.  ( 4 min )
  • Open

    How does the ACER algorithm work?
    I am currently writing a report on reinforcement learning, where I am trying to describe how the ACER algorithm works. I have read the arxiv paper on the sample actor-critic with experienced replay, but I don't understand where the experience replay comes in. Is this part of the policy gradient? where the policy is updated every episode it's trained on from the previous knowledge it gathers in previous episodes. https://arxiv.org/pdf/1611.01224.pdf ​ submitted by /u/beepingwater_neko [link] [comments]  ( 1 min )
    What’s the best way to implement tree bases function approximators for RL/Control?
    Sorry if this post is not appropriate here, but I have been wondering how can I implement and learn a decision tree or any other non differentiable function approximators for the Value Function. It’s relatively easy to formulate and use DQN type algorithms by using neural network and say pytorch + stochastic optimization but I want to try out some tree based methods. (at least to reproduce papers which claim to use them) But I don’t know 1) If we have to design the structures and learning algorithms by hand or is there any package I can use? 2) How should the learning be done? We obviously can’t go regression type learning because of the bootstrapping nature of the Bellman equation? Thanks submitted by /u/Htaseht [link] [comments]  ( 1 min )
    I'm working on a DQN agent using the Keras RL library to play Atari games, however a weird thing keeps happening where every episode is the same length but it's a random number each time.
    The episode step count is the same for training and testing. submitted by /u/Gleann_na_nGealt [link] [comments]  ( 1 min )
    Are there any real life projects I could do with this? How do I get ideas to use this?
    I tried using RL for some work at my university, it did not really work all that well. I'm wondering if there are some real life scenarios that I could use to create my own personal projects. Otherwise, I'd be fine with games. I want to try all of it out from Dynamic Programming to crazy ass stuff like A3C, PPO and so on. I like RL, more so than any other form of ML, and I want to play around with it as a hobby. This is really the area of ML for me. For starters, there are fundamentals behind it, so you can mostly explain why agents do one thing or another. Additionally, there isn't a need to have massive amounts of data. It's also the only type of ML that I've been able to successfully use with software engineering. Designing the agent and the API it uses to take actions in an environment is as much a software engineering project as is creating a REST API. I feel there is a lot of potential for me to go crazy with this, and I was wondering if people have any cool suggestions. Anything that is real time is anything that I want to do. Real time systems and RL are exactly the sort of thing I love to do. submitted by /u/HesperusIII [link] [comments]  ( 2 min )
  • Open

    AI News | ALS Brain Computer Interface 1 Year Human Trial Results | Skin Cancer Detection | New IBM AI Hardware
    submitted by /u/getrich_or_diemining [link] [comments]
    Your Next Teacher Will be a Machine: Why the Future of Education is Automation
    submitted by /u/itsallshit-eatup [link] [comments]
    Hi!, Im wondering if anyone could help me🇦🇷🇦🇷
    Im a 19yo guy from Argentina that studies system ingeneer, I like my career, beeing an ingeneer is great, but coding and AI is greater, Im tired of courses like Free code academy, or basics things, im looking for a more professional, useful and deeper courses, that will really teach me, im currently with python(pandas,numpy,matplotlib,tensorflow) basics, and wanna to be better in that field that i love❤ submitted by /u/Sasulanda [link] [comments]  ( 1 min )
    Active non-ML research areas?
    What are the most active non-ML/statistical research areas in AI? Are there any recent books published that give an overview of such areas? Seems like AI is now either ML or people saying that ML won’t work, but vague on alternatives. submitted by /u/spookyplatypus [link] [comments]  ( 1 min )
    Heard about Github Copilot? Now Meet Salesforce's 'CodeGen’ : An AI Model That Turns Simple Natural Language Requests Into Executable Code
    Imagine being able to tell a machine to write an app simply by telling it what the app does. As far-fetched as it may appear, this scenario is already a reality. According to Salesforce AI Research, conversational AI programming is a new paradigm that brings this vision to life, thanks to an AI system that builds software. Introducing CodeGen: Creating Programs from Prompts The large-scale language model, CodeGen, which converts simple English prompts into executable code, is the first step toward this objective. The person doesn’t write any code; instead, (s)he describes what (s)he wants the code to perform in normal language, and the computer does the rest. Conversational AI refers to technologies that allow a human and a computer to engage naturally through a conversation. Chatbots, voice assistants, and virtual agents are examples of conversational AI. Continue Reading Paper: https://arxiv.org/pdf/2203.13474.pdf Github: https://github.com/salesforce/CodeGen https://i.redd.it/dbyba3dct8r81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 2 min )
  • Open

    Dan Huttenlocher ponders our human future in an age of artificial intelligence
    For the MIT Schwarzman College of Computing dean, bringing disciplines together is the best way to address challenges and opportunities posed by rapid advancements in computing.  ( 8 min )
  • Open

    Enhancing Satellite Imagery using Deep Learning for the Sensor To Shooter Timeline. (arXiv:2203.00116v3 [cs.CV] UPDATED)
    The sensor to shooter timeline is affected by two main variables: satellite positioning and asset positioning. Speeding up satellite positioning by adding more sensors or by decreasing processing time is important only if there is a prepared shooter, otherwise the main source of time is getting the shooter into position. However, the intelligence community should work towards the exploitation of sensors to the highest speed and effectiveness possible. Achieving a high effectiveness while keeping speed high is a tradeoff that must be considered in the sensor to shooter timeline. In this paper we investigate two main ideas, increasing the effectiveness of satellite imagery through image manipulation and how on-board image manipulation would affect the sensor to shooter timeline. We cover these ideas in four scenarios: Discrete Event Simulation of onboard processing versus ground station processing, quality of information with cloud cover removal, information improvement with super resolution, and data reduction with image to caption. This paper will show how image manipulation techniques such as Super Resolution, Cloud Removal, and Image to Caption will improve the quality of delivered information in addition to showing how those processes effect the sensor to shooter timeline.  ( 2 min )

  • Open

    AgentZero: Ray & PyTorch based light-weight Distributed Fast Reinforcement Learning Framework
    AgentZero https://github.com/zhoubin-me/agent0 This is my personal project developed two years ago. It covers major DRL algorithms like: - [DQN](https://arxiv.org/abs/1312.5602)- [Double DQN](https://arxiv.org/abs/1509.06461)- [Dueling DQN](https://arxiv.org/abs/1511.06581)- [Prioritized Experience Replay](https://arxiv.org/abs/1511.05952)- [Noisy Network](https://arxiv.org/abs/1706.10295)- [C51](https://arxiv.org/abs/1707.06887)- [Rainbow](https://arxiv.org/abs/1710.02298)- [QR-DQN](https://arxiv.org/abs/1710.10044)- [IQR](https://arxiv.org/abs/1806.06923)- [FQF](https://arxiv.org/abs/1911.02140)- [DDPG](https://arxiv.org/abs/1509.02971)- [SAC](https://arxiv.org/abs/1801.01290)- [TD3](https://arxiv.org/abs/1802.09477)- [MDQN](https://arxiv.org/abs/2007.14430) What is amazing is its speed and memory efficiency after some optimization: With a single 2080Ti GPU and a 8 core AMD CPU, the training speed of rainbow for Atari could achieve 3000FPS, which means it can finish training of 10M frames within 1 hour. With compression of image frames, replay memory's RAM usage is down by 20%. I have tested several algorithms and games on Atari and get some initial result. Welcome to use and contribute! submitted by /u/zhoubin-me [link] [comments]  ( 1 min )
    Regularization for DRL: reward or objective function?
    I am searching for regularization methods applied to DRL algorithms (either value or policy-based) to understand what has been done so far in the field. I cannot find any valid reference that studies the effect of applying a soft constraint to the reward function instead of to the policy objective. This may seem useless for some applications, but it is highly relevant for finance, which is my domain of application, The idea I have so far is that if you constrain the reward, it is like you are imposing limits on the agent's behavior. On the other hand, if you constrain the objective, you are not limiting the behavior, but you are correcting the ex-post the undesired behavior. The latter way does not allow the agent to learn not to behave in a certain way. ​ Did anyone ever think about it? Are they good references that analyze the different effects of a constraint to whatever DRL algorithm? submitted by /u/alebrini [link] [comments]  ( 1 min )
    What does it mean to feed the "network state" in an LSTM in the actor network?
    I am looking at this code from Google (https://github.com/google-research/google-research/blob/master/social_rl/multiagent_tfagents/joint_attention/attention_networks.py). At line 639, the LSTM is called. The first two inputs are the state and the network state, but I don't understand what the latter is. submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    DeepRL and Rubik’s Cube
    I'm part of a group of researchers from top ML institutions and industry, our goal is to figure out how improve efficiency in DeepRL. We are looking at Rubik’s Cube as target problem, and kicking off a project which will start from https://github.com/forestagostinelli/DeepCubeA and go from there. Prior works require hand crafted curriculum and billion of interactions to solve a cube, we believe that order of magnitude more compute that it should take. Is anyone interested to collaborate? I'm happy to dedicate a few hours a week to help a newcomer like I was a few years ago with the RL stuff given some basics of machine learning and programming skills, and this could be the golden opportunity for someone to see RL at scale. submitted by /u/mind_library [link] [comments]  ( 2 min )
    Multi agent reinforcement learning
    I'm absolutely new to machine learning, let alone reinforcement learning. I've been tasked to replicate and if possible improve upon the paper linked in the post. I don't know what platform to use and how to create the custom environment. if anybody could share any resources it would be tremendously helpful. https://drive.google.com/file/d/1fIT43hKi61WUIvoTh2a3AWlRsphi-L98/view?usp=sharing submitted by /u/Lazarus_07 [link] [comments]  ( 1 min )
    q-learning vs. policy gradient
    Trying to wrap my head around the RL essentials. Would it be correct to say that Q-learning attempts to select the best available policy by optimizing the Q-function, while policy gradient methods work directly to optimize a pre-determined policy's parameters? submitted by /u/JimBeanery [link] [comments]  ( 1 min )
    How to use a deep model for DRL?
    I noticed most DRL papers use very shallow models like three or four layers. However, when I try to do DRL tasks that have relatively complicated scenes (for example, some modern video game), shallow models become way too weak. Are there papers, blogs, articles etc. that use more complex/deep models? Or maybe some methods that can deal with complicated scenes without deep models? Thanks submitted by /u/seermer [link] [comments]  ( 1 min )
    Any thought on: A universal parameter optimizer
    Hi all, I have an thought on A universal parameter optimizer I wanted to share with you and to see if you know some related work. Assume you have a simulation or access to an environment. There are certain parameters you can set to control the performance of a system which lives in this simulation/environment. Naturally, one wants to find the optimal parameters or optimal policy to set the parameter that can result the most reward, however that is defined. For example, in the stock market, I may want to find the optimal market price to buy and sell, or the optimal policy. In a car driving game, I may want to determine the optimal policy to set speed and direction. Do we know if there is formal way to study this type of problem? Thank you! submitted by /u/DB8868 [link] [comments]  ( 1 min )
  • Open

    Building Trust with Responsible AI
    Artificial Intelligence is being used in almost every aspect of life. AI symbolizes growth and productivity in the minds of some, but it is raising questions as well on the fairness, privacy, and security of these systems. Many legitimate issues exist, including biased choices, labor replacement, and a lack of security. When it comes to robots, this is very frightening. Self-driving automobiles, for example, can cause injury or death if they make mistakes. Responsible AI addresses these difficulties and makes AI systems more accountable. Responsible AI should fulfill the following aims: Interpretability: We obtain an explanation for how a model makes predictions when we interpret it. An AI system makes predictions for a user. Even if these selections are correct, a user is likely to seek an explanation. Responsible AI can describe how we create interpretable models. Fairness: AI systems have the potential to make judgments that are biased towards particular groups of people. Bias in the training data is the source of this bias. The easier it is to assure fairness and rectify any bias in a model, the more interpretable it is. As a result, we need a Responsible AI framework to explain how we evaluate fairness and what to do if a model makes unjust predictions. Safety and Security: AI systems aren’t deterministic. When confronted with new situations, they are prone to making poor choices. The systems can even be tampered with to make unwise decisions. Therefore, we need to ensure safety and security in these systems. Data Governance: The data used must be of high quality. If the data used by AI has errors, the system may make wrong decisions. Continue Reading The Article Here ​ https://preview.redd.it/9iivp31ir6r81.png?width=1024&format=png&auto=webp&s=207409694b68a1e985ad1dfcf3b466ac25916da2 submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    It's unbelievable what ML can do! Disco+RIFE= 7hr Colab Run...
    submitted by /u/JoshGrambo [link] [comments]
    Creating A Chatbot with transformers and Gradio
    submitted by /u/Illustrious_Row_9971 [link] [comments]
    Researchers Develop Parking Analytics Framework Using Deep Learning
    Artificial Intelligence and deep learning in video analytics are gaining popularity. It has enabled a wide range of industrial applications, including surveillance and public safety, robotics perception, medical intervention, and facial recognition. According to Markets & Markets, the global market for video analytics was valued at USD 5.9 billion in 2021 and is predicted to reach USD 14.9 billion by 2026. Unmanned aerial vehicles (UAVs) have also enabled a wide range of video analytics applications (e.g., aerial surveys) since they provide aerial views of the environment, allowing for collecting aerial photos and processing with deep learning algorithms. Parking analytics is one of these critical smart city applications that uses deep learning and UAVs to collect real-time data and analyze it in order to maximize parking revenue, enhance parking resource allocations, and better manage public space. Continue Reading Paper: https://arxiv.org/pdf/2203.07792.pdf ​ https://i.redd.it/u5th7z0ja5r81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    How will AI impact games
    submitted by /u/GravermanYT [link] [comments]
  • Open

    Oscillations in RLC circuits
    Electrical and mechanical oscillations satisfy analogous equations. This is the basis of using the word “analog” in electronics. You could study a mechanical system by building an analogous circuit and measuring that circuit in a lab. Mass, dashpot, spring Years ago I wrote a series of four posts about mechanical vibrations: Free, undamped vibrations Free, […] Oscillations in RLC circuits first appeared on John D. Cook.  ( 2 min )
  • Open

    [D] How Logistic Regression nomogram is constructed from binary classifier?
    Well I' ve been reading some scientific works and I don't understand how nomograms are constructed from logistic regression models. In this example I have: https://ieeexplore-1ieee-1org-1000007l206e9.han.bg.pg.edu.pl/document/9514609 And they train LR model on Covid19 dataset [death/ didn't die] so it's binary classification problem However later on, they construct nomogram, which determines whether there is low/moderate/high risk of covid19 mortality. What I don't undestand is how they calculate the score the establish chances of death. E.G. If the score is <0.05 there is low possibility that patient will die. My general question is, how they constructed this nomogram from the binary classifier they had? submitted by /u/s168501 [link] [comments]  ( 1 min )
    [R] Neural Head Avatars from Monocular RGB Videos (CVPR 2022)
    submitted by /u/Mandelmus100 [link] [comments]
    [D] Predicting hard properties of graphs using Machine Learning
    Hello everyone, there is a lot of work in the field of geometric deep learning for combinatorial optimization that yields good approximation algorithms for "hard" problems on graphs (see here), with the most prominent example being the TSP problem. However, as far as I can see, all these considered problems share the fact that the computed solution is a subset of the vertices/edges of the original graph. In my field (graph drawing), one of the most important considered properties is the crossing number). Hence, the solution would not consists of a labeling of the edges/vertices, but is rather a regression task on the whole graph. I have a dataset that consists of roughly 10000 graphs together with their crossing number. Treating the above problem as a supervised regression task and simply inserting the graph into a GNN does not work for me at all - is this a problem of my choice of architecture or is this sort of "function" that maps a graph to its crossing number something we can expect no current architecture to find. I appreciate any comment, even if it is just your intuition on the problem. Best regards, MrLemming submitted by /u/MrLemming2 [link] [comments]  ( 1 min )
    [N] Announcement: Call for Papers for our ICML ShiftHappens Workshop!
    Dear community, I hope I do not violate rules by advertizing our Call for Papers here. In a nutshell, submissions can be robustness or OOD datasets and new metrics which we will consolidate in one benchmark. More infos on our website. I am happy to answer any questions in regards to the call. submitted by /u/helavisa4 [link] [comments]  ( 1 min )
    [P] OpenAI Codex helping to write shell commands
    submitted by /u/tomd_96 [link] [comments]  ( 1 min )
    [R][P] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 1 min )
    [D] PetaFLOPS as a Unit of Measure in Machine Learning Applications
    I was looking at this paper (https://arxiv.org/pdf/2005.14165.pdf) and came across this graph: ​ https://preview.redd.it/49ydy9bo61r81.png?width=665&format=png&auto=webp&s=729743449d6a99d2b84b81610e7e32d87ea4dfeb I am trying to understand the following two things about this graph: ​ What is PetaFLOP/s-days? I read that a PetaFLOP is 1,000,000,000,000,000 calculations (e.g. addition, subtraction). I am guessing that 10^2 would imply 100 * 1,000,000,000,000,000 calculations per day - is this correct? Is there any difference between PetaFLOP/days and PetaFLOP/s-days? (I also find it interesting they are probably referring to "computer resources" as simply "compute") What does "C" stand for in L = 2.57 * C^-0.048? I am guessing that the "dotted line" probably refers to the "average loss" for different Neural Networks with differing amounts of Parameters - but what exactly does "C" stand for? Finally, is there a reason that "Validation Loss" is not expressed as a percentage? For instance, what is a Validation Loss of 3? Is a Validation Loss of 3 the same as a Loss of 30%? Or does Validation Loss simply refer to the value of the Loss Function obtained during the Validation stage of Cross Validation? Thank you! submitted by /u/blueest [link] [comments]  ( 2 min )
  • Open

    What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02)
    In my previous post, I briefly described how leading companies use experimentation to optimize their products and services and evolve them to the point of feeling elegant, efficient, and magical. These companies have developed mature experimentation programs (ExPrs), including the… Read More The post What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02) appeared first on ML in Production.  ( 6 min )
  • Open

    What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02)
    In my previous post, I briefly described how leading companies use experimentation to optimize their products and services and evolve them to the point of feeling elegant, efficient, and magical. These companies have developed mature experimentation programs (ExPrs), including the… Read More The post What is an Experimentation program and Who is Involved? (Experimentation Program Series: Guide 02) appeared first on ML in Production.  ( 6 min )
  • Open

    NN from Scratch: #1 Data Preprocessing | Kolbenkraft
    submitted by /u/cjmodi306 [link] [comments]
    C++ Machine Learning Book
    Hey, guys. Just want to ask if anybody's interested with a C++ machine learning book, "Hands-on Machine Learning with C++" by Kirill Kolodiazhnyi. If you are, send me a DM. submitted by /u/edmondgrasa [link] [comments]
    Efficient net vs resnet
    As the title says, I would like to know/get some direction to the question when in general does and effnet is preferred to a resnet? I understand that the paper compares performances and it shows a higher performance wrt every network. So my question would be is that always the case or is there a specific situation where it would be better? Sorry for the typos (on my mobile) submitted by /u/johnyj01 [link] [comments]  ( 1 min )
  • Open

    MyStyle: A Personalized Generative Prior. (arXiv:2203.17272v1 [cs.CV])
    We introduce MyStyle, a personalized deep generative prior trained with a few shots of an individual. MyStyle allows to reconstruct, enhance and edit images of a specific person, such that the output is faithful to the person's key facial characteristics. Given a small reference set of portrait images of a person (~100), we tune the weights of a pretrained StyleGAN face generator to form a local, low-dimensional, personalized manifold in the latent space. We show that this manifold constitutes a personalized region that spans latent codes associated with diverse portrait images of the individual. Moreover, we demonstrate that we obtain a personalized generative prior, and propose a unified approach to apply it to various ill-posed image enhancement problems, such as inpainting and super-resolution, as well as semantic editing. Using the personalized generative prior we obtain outputs that exhibit high-fidelity to the input images and are also faithful to the key facial characteristics of the individual in the reference set. We demonstrate our method with fair-use images of numerous widely recognizable individuals for whom we have the prior knowledge for a qualitative evaluation of the expected outcome. We evaluate our approach against few-shots baselines and show that our personalized prior, quantitatively and qualitatively, outperforms state-of-the-art alternatives.  ( 2 min )
    Muesli: Combining Improvements in Policy Optimization. (arXiv:2104.06159v2 [cs.LG] UPDATED)
    We propose a novel policy update that combines regularized policy optimization with model learning as an auxiliary loss. The update (henceforth Muesli) matches MuZero's state-of-the-art performance on Atari. Notably, Muesli does so without using deep search: it acts directly with a policy network and has computation speed comparable to model-free baselines. The Atari results are complemented by extensive ablations, and by additional results on continuous control and 9x9 Go.  ( 2 min )
    Improved Relation Networks for End-to-End Speaker Verification and Identification. (arXiv:2203.17218v1 [eess.AS])
    Speaker identification systems in a real-world scenario are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples for each enrolled speaker. This paper demonstrates the effectiveness of meta-learning and relation networks for this use case. We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification. The use of relation networks facilitates joint training of the frontend speaker encoder and the backend model. Inspired by the use of prototypical networks in speaker verification and to increase the discriminability of the speaker embeddings, we train the model to classify samples in the current episode amongst all speakers present in the training set. Furthermore, we propose a new training regime for faster model convergence by extracting more information from a given meta-learning episode with negligible extra computation. We evaluate the proposed techniques on VoxCeleb, SITW and VCTK datasets on the tasks of speaker verification and unseen speaker identification. The proposed approach outperforms the existing approaches consistently on both tasks.  ( 2 min )
    Demystifying the Transferability of Adversarial Attacks in Computer Networks. (arXiv:2110.04488v3 [cs.CR] UPDATED)
    Convolutional Neural Networks (CNNs) models are one of the most frequently used deep learning networks, and extensively used in both academia and industry. Recent studies demonstrated that adversarial attacks against such models can maintain their effectiveness even when used on models other than the one targeted by the attacker. This major property is known as transferability, and makes CNNs ill-suited for security applications. In this paper, we provide the first comprehensive study which assesses the robustness of CNN-based models for computer networks against adversarial transferability. Furthermore, we investigate whether the transferability property issue holds in computer networks applications. In our experiments, we first consider five different attacks: the Iterative Fast Gradient Method (I-FGSM), the Jacobian-based Saliency Map (JSMA), the Limited-memory Broyden Fletcher Goldfarb Shanno BFGS (L- BFGS), the Projected Gradient Descent (PGD), and the DeepFool attack. Then, we perform these attacks against three well- known datasets: the Network-based Detection of IoT (N-BaIoT) dataset, the Domain Generating Algorithms (DGA) dataset, and the RIPE Atlas dataset. Our experimental results show clearly that the transferability happens in specific use cases for the I- FGSM, the JSMA, and the LBFGS attack. In such scenarios, the attack success rate on the target network range from 63.00% to 100%. Finally, we suggest two shielding strategies to hinder the attack transferability, by considering the Most Powerful Attacks (MPAs), and the mismatch LSTM architecture.  ( 2 min )
    DeepEdge: A Deep Reinforcement Learning based Task Orchestrator for Edge Computing. (arXiv:2110.01863v2 [cs.NI] UPDATED)
    The improvements in the edge computing technology pave the road for diversified applications that demand real-time interaction. However, due to the mobility of the end-users and the dynamic edge environment, it becomes challenging to handle the task offloading with high performance. Moreover, since each application in mobile devices has different characteristics, a task orchestrator must be adaptive and have the ability to learn the dynamics of the environment. For this purpose, we develop a deep reinforcement learning based task orchestrator, DeepEdge, which learns to meet different task requirements without needing human interaction even under the heavily-loaded stochastic network conditions in terms of mobile users and applications. Given the dynamic offloading requests and time-varying communication conditions, we successfully model the problem as a Markov process and then apply the Double Deep Q-Network (DDQN) algorithm to implement DeepEdge. To evaluate the robustness of DeepEdge, we experiment with four different applications including image rendering, infotainment, pervasive health, and augmented reality in the network under various loads. Furthermore, we compare the performance of our agent with the four different task offloading approaches in the literature. Our results show that DeepEdge outperforms its competitors in terms of the percentage of satisfactorily completed tasks.  ( 2 min )
    Model-based Reinforcement Learning: A Survey. (arXiv:2006.16712v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.  ( 2 min )
    TransEditor: Transformer-Based Dual-Space GAN for Highly Controllable Facial Editing. (arXiv:2203.17266v1 [cs.CV])
    Recent advances like StyleGAN have promoted the growth of controllable facial editing. To address its core challenge of attribute decoupling in a single latent space, attempts have been made to adopt dual-space GAN for better disentanglement of style and content representations. Nonetheless, these methods are still incompetent to obtain plausible editing results with high controllability, especially for complicated attributes. In this study, we highlight the importance of interaction in a dual-space GAN for more controllable editing. We propose TransEditor, a novel Transformer-based framework to enhance such interaction. Besides, we develop a new dual-space editing and inversion strategy to provide additional editing flexibility. Extensive experiments demonstrate the superiority of the proposed framework in image quality and editing capability, suggesting the effectiveness of TransEditor for highly controllable facial editing.  ( 2 min )
    A Unifying Framework for Reinforcement Learning and Planning. (arXiv:2006.15009v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.  ( 2 min )
    Data Augmentation for Opcode Sequence Based Malware Detection. (arXiv:2106.11821v2 [cs.CR] UPDATED)
    In this paper we study data augmentation for opcode sequence based Android malware detection. Data augmentation has been successfully used in many areas of deep-learning to significantly improve model performance. Typically, data augmentation simulates realistic variations in data to increase the apparent diversity of the training-set. However, for opcode-based malware analysis it is not immediately clear how to apply data augmentation. Hence we first study the use of fixed transformations, then progress to adaptive methods. We propose a novel data augmentation method -- Self-Embedding Language Model Augmentation -- that uses a malware detection network's own opcode embedding layer to measure opcode similarity for adaptive augmentation. To the best of our knowledge this is the first paper to carry out a systematic study of different augmentation methods for opcode sequence based Android malware classification.  ( 2 min )
    Recommender Systems meet Mechanism Design. (arXiv:2110.12558v2 [cs.GT] UPDATED)
    Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers. We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders' value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.  ( 2 min )
    Preliminary Steps Towards Federated Sentiment Classification. (arXiv:2107.11956v2 [cs.CL] UPDATED)
    Automatically mining sentiment tendency contained in natural language is a fundamental research to some artificial intelligent applications, where solutions alternate with challenges. Transfer learning and multi-task learning techniques have been leveraged to mitigate the supervision sparsity and collaborate multiple heterogeneous domains correspondingly. Recent years, the sensitive nature of users' private data raises another challenge for sentiment classification, i.e., data privacy protection. In this paper, we resort to federated learning for multiple domain sentiment classification under the constraint that the corpora must be stored on decentralized devices. In view of the heterogeneous semantics across multiple parties and the peculiarities of word embedding, we pertinently provide corresponding solutions. First, we propose a Knowledge Transfer Enhanced Private-Shared (KTEPS) framework for better model aggregation and personalization in federated sentiment classification. Second, we propose KTEPS$^\star$ with the consideration of the rich semantic and huge embedding size properties of word vectors, utilizing Projection-based Dimension Reduction (PDR) methods for privacy protection and efficient transmission simultaneously. We propose two federated sentiment classification scenes based on public benchmarks, and verify the superiorities of our proposed methods with abundant experimental investigations.  ( 2 min )
    Causal Feature Selection for Algorithmic Fairness. (arXiv:2006.06053v2 [cs.LG] UPDATED)
    The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.  ( 2 min )
    When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?. (arXiv:2110.04184v2 [cs.LG] UPDATED)
    Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games with $H$ steps, $S$ states, and $A_i$ actions per player. First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in $\max_{i\le m} A_i$. Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $\epsilon$-approximate Nash equilibrium within $\widetilde{\mathcal{O}}(S\sum_{i\le m} A_i / \epsilon^3)$ episodes (when only highlighting the dependence on $S$, $A_i$, and $\epsilon$), which only depends linearly in $\sum_{i\le m} A_i$ and significantly improves over existing efficient algorithm in the $\epsilon$ dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.  ( 2 min )
    TUNet: A Block-online Bandwidth Extension Model based on Transformers and Self-supervised Pretraining. (arXiv:2110.13492v4 [cs.LG] UPDATED)
    We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in both intrusive and non-intrusive metrics. Pretraining and filter augmentation also help stabilize and enhance the overall performance.  ( 2 min )
    Instance Weighted Incremental Evolution Strategies for Reinforcement Learning in Dynamic Environments. (arXiv:2010.04605v2 [cs.LG] UPDATED)
    Evolution strategies (ES), as a family of black-box optimization algorithms, recently emerge as a scalable alternative to reinforcement learning (RL) approaches such as Q-learning or policy gradient, and are much faster when many central processing units (CPUs) are available due to better parallelization. In this paper, we propose a systematic incremental learning method for ES in dynamic environments. The goal is to adjust previously learned policy to a new one incrementally whenever the environment changes. We incorporate an instance weighting mechanism with ES to facilitate its learning adaptation, while retaining scalability of ES. During parameter updating, higher weights are assigned to instances that contain more new knowledge, thus encouraging the search distribution to move towards new promising areas of parameter space. We propose two easy-to-implement metrics to calculate the weights: instance novelty and instance quality. Instance novelty measures an instance's difference from the previous optimum in the original environment, while instance quality corresponds to how well an instance performs in the new environment. The resulting algorithm, Instance Weighted Incremental Evolution Strategies (IW-IES), is verified to achieve significantly improved performance on challenging RL tasks ranging from robot navigation to locomotion. This paper thus introduces a family of scalable ES algorithms for RL domains that enables rapid learning adaptation to dynamic environments.  ( 2 min )
    Rerunning OCR: A Machine Learning Approach to Quality Assessment and Enhancement Prediction. (arXiv:2110.01661v4 [cs.CL] UPDATED)
    Iterating with new and improved OCR solutions enforces decision making when it comes to targeting the right candidates for reprocessing. This especially applies when the underlying data collection is of considerable size and rather diverse in terms of fonts, languages, periods of publication and consequently OCR quality. This article captures the efforts of the National Library of Luxembourg to support those targeting decisions. They are crucial in order to guarantee low computational overhead and reduced quality degradation risks, combined with a more quantifiable OCR improvement. In particular, this work explains the methodology of the library with respect to text block level quality assessment. Through extension of this technique, a regression model, that is able to take into account the enhancement potential of a new OCR engine, is also presented. They both mark promising approaches, especially for cultural institutions dealing with historical data of lower quality.  ( 2 min )
    Continual Speaker Adaptation for Text-to-Speech Synthesis. (arXiv:2103.14512v2 [cs.CL] UPDATED)
    Training a multi-speaker Text-to-Speech (TTS) model from scratch is computationally expensive and adding new speakers to the dataset requires the model to be re-trained. The naive solution of sequential fine-tuning of a model for new speakers can lead to poor performance of older speakers. This phenomenon is known as catastrophic forgetting. In this paper, we look at TTS modeling from a continual learning perspective, where the goal is to add new speakers without forgetting previous speakers. Therefore, we first propose an experimental setup and show that serial fine-tuning for new speakers can cause the forgetting of the earlier speakers. Then we exploit two well-known techniques for continual learning, namely experience replay and weight regularization. We reveal how one can mitigate the effect of degradation in speech synthesis diversity in sequential training of new speakers using these methods. Finally, we present a simple extension to experience replay to improve the results in extreme setups where we have access to very small buffers.  ( 2 min )
    Adversarial Examples in Random Neural Networks with General Activations. (arXiv:2203.17209v1 [cs.LG])
    A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU networks with sub-exponential width. We present a result of the same type, with no restriction on width and for general locally Lipschitz continuous activations. More precisely, given a neural network $f(\,\cdot\,;{\boldsymbol \theta})$ with random weights ${\boldsymbol \theta}$, and feature vector ${\boldsymbol x}$, we show that an adversarial example ${\boldsymbol x}'$ can be found with high probability along the direction of the gradient $\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$. Our proof is based on a Gaussian conditioning technique. Instead of proving that $f$ is approximately linear in a neighborhood of ${\boldsymbol x}$, we characterize the joint distribution of $f({\boldsymbol x};{\boldsymbol \theta})$ and $f({\boldsymbol x}';{\boldsymbol \theta})$ for ${\boldsymbol x}' = {\boldsymbol x}-s({\boldsymbol x})\nabla_{{\boldsymbol x}}f({\boldsymbol x};{\boldsymbol \theta})$.  ( 2 min )
    A 23 MW data centre is all you need. (arXiv:2203.17265v1 [cs.LG])
    The field of machine learning has achieved striking progress in recent years, witnessing breakthrough results on language modelling, protein folding and nitpickingly fine-grained dog breed classification. Some even succeeded at playing computer games and board games, a feat both of engineering and of setting their employers' expectations. The central contribution of this work is to carefully examine whether this progress, and technology more broadly, can be expected to continue indefinitely. Through a rigorous application of statistical theory and failure to extrapolate beyond the training data, we answer firmly in the negative and provide details: technology will peak at 3:07 am (BST) on 20th July, 2032. We then explore the implications of this finding, discovering that individuals awake at this ungodly hour with access to a sufficiently powerful computer possess an opportunity for myriad forms of long-term linguistic 'lock in'. All we need is a large (>> 1W) data centre to seize this pivotal moment. By setting our analogue alarm clocks, we propose a tractable algorithm to ensure that, for the future of humanity, the British spelling of colour becomes the default spelling across more than 80% of the global word processing software market.  ( 2 min )
    Automatic Detection of Expressed Emotion from Five-Minute Speech Samples: Challenges and Opportunities. (arXiv:2203.17242v1 [cs.SD])
    We present a novel feasibility study on the automatic recognition of Expressed Emotion (EE), a family environment concept based on caregivers speaking freely about their relative/family member. We describe an automated approach for determining the \textit{degree of warmth}, a key component of EE, from acoustic and text features acquired from a sample of 37 recorded interviews. These recordings, collected over 20 years ago, are derived from a nationally representative birth cohort of 2,232 British twin children and were manually coded for EE. We outline the core steps of extracting usable information from recordings with highly variable audio quality and assess the efficacy of four machine learning approaches trained with different combinations of acoustic and text features. Despite the challenges of working with this legacy data, we demonstrated that the degree of warmth can be predicted with an $F_{1}$-score of \textbf{61.5\%}. In this paper, we summarise our learning and provide recommendations for future work using real-world speech samples.
    VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers. (arXiv:2203.17247v1 [cs.CV])
    Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems. However, although visualization and interpretability tools have become available for NLP models, internal mechanisms of vision and multimodal transformers remain largely opaque. With the success of these transformers, it is increasingly critical to understand their inner workings, as unraveling these black-boxes will lead to more capable and trustworthy models. To contribute to this quest, we propose VL-InterpreT, which provides novel interactive visualizations for interpreting the attentions and hidden representations in multimodal transformers. VL-InterpreT is a task agnostic and integrated tool that (1) tracks a variety of statistics in attention heads throughout all layers for both vision and language components, (2) visualizes cross-modal and intra-modal attentions through easily readable heatmaps, and (3) plots the hidden representations of vision and language tokens as they pass through the transformer layers. In this paper, we demonstrate the functionalities of VL-InterpreT through the analysis of KD-VLP, an end-to-end pretraining vision-language multimodal transformer-based model, in the tasks of Visual Commonsense Reasoning (VCR) and WebQA, two visual question answering benchmarks. Furthermore, we also present a few interesting findings about multimodal transformer behaviors that were learned through our tool.
    Wind Farm Layout Optimisation using Set Based Multi-objective Bayesian Optimisation. (arXiv:2203.17065v1 [stat.ML])
    Wind energy is one of the cleanest renewable electricity sources and can help in addressing the challenge of climate change. One of the drawbacks of wind-generated energy is the large space necessary to install a wind farm; this arises from the fact that placing wind turbines in a limited area would hinder their productivity and therefore not be economically convenient. This naturally leads to an optimisation problem, which has three specific challenges: (1) multiple conflicting objectives (2) computationally expensive simulation models and (3) optimisation over design sets instead of design vectors. The first and second challenges can be addressed by using surrogate-assisted e.g.\ Bayesian multi-objective optimisation. However, the traditional Bayesian optimisation cannot be applied as the optimisation function in the problem relies on design sets instead of design vectors. This paper extends the applicability of Bayesian multi-objective optimisation to set based optimisation for solving the wind farm layout problem. We use a set-based kernel in Gaussian process to quantify the correlation between wind farms (with a different number of turbines). The results on the given data set of wind energy and direction clearly show the potential of using set-based Bayesian multi-objective optimisation.
    OoD-Bench: Quantifying and Understanding Two Dimensions of Out-of-Distribution Generalization. (arXiv:2106.03721v3 [cs.LG] UPDATED)
    Deep learning has achieved tremendous success with independent and identically distributed (i.i.d.) data. However, the performance of neural networks often degenerates drastically when encountering out-of-distribution (OoD) data, i.e., when training and test data are sampled from different distributions. While a plethora of algorithms have been proposed for OoD generalization, our understanding of the data used to train and evaluate these algorithms remains stagnant. In this work, we first identify and measure two distinct kinds of distribution shifts that are ubiquitous in various datasets. Next, through extensive experiments, we compare OoD generalization algorithms across two groups of benchmarks, each dominated by one of the distribution shifts, revealing their strengths on one shift as well as limitations on the other shift. Overall, we position existing datasets and algorithms from different research areas seemingly unconnected into the same coherent picture. It may serve as a foothold that can be resorted to by future OoD generalization research. Our code is available at https://github.com/ynysjtu/ood_bench.
    Does Audio Deepfake Detection Generalize?. (arXiv:2203.16263v2 [cs.SD] UPDATED)
    Current text-to-speech algorithms produce realistic fakes of human voices, making deepfake detection a much-needed area of research. While researchers have presented various techniques for detecting audio spoofs, it is often unclear exactly why these architectures are successful: Preprocessing steps, hyperparameter settings, and the degree of fine-tuning are not consistent across related work. Which factors contribute to success, and which are accidental? In this work, we address this problem: We systematize audio spoofing detection by re-implementing and uniformly evaluating architectures from related work. We identify overarching features for successful audio deepfake detection, such as using cqtspec or logspec features instead of melspec features, which improves performance by 37% EER on average, all other factors constant. Additionally, we evaluate generalization capabilities: We collect and publish a new dataset consisting of 37.9 hours of found audio recordings of celebrities and politicians, of which 17.2 hours are deepfakes. We find that related work performs poorly on such real-world data (performance degradation of up to one thousand percent). This may suggest that the community has tailored its solutions too closely to the prevailing ASVSpoof benchmark and that deepfakes are much harder to detect outside the lab than previously thought.
    Optimisation-free Classification and Density Estimation with Quantum Circuits. (arXiv:2203.14452v2 [quant-ph] UPDATED)
    We demonstrate the implementation of a novel machine learning framework for probability density estimation and classification using quantum circuits. The framework maps a training data set or a single data sample to the quantum state of a physical system through quantum feature maps. The quantum state of the arbitrarily large training data set summarises its probability distribution in a finite-dimensional quantum wave function. By projecting the quantum state of a new data sample onto the quantum state of the training data set, one can derive statistics to classify or estimate the density of the new data sample. Remarkably, the implementation of our framework on a real quantum device does not require any optimisation of quantum circuit parameters. Nonetheless, we discuss a variational quantum circuit approach that could leverage quantum advantage for our framework.
    Deep Multi-modal Fusion of Image and Non-image Data in Disease Diagnosis and Prognosis: A Review. (arXiv:2203.15588v2 [cs.LG] UPDATED)
    The rapid development of diagnostic technologies in healthcare is leading to higher requirements for physicians to handle and integrate the heterogeneous, yet complementary data that are produced during routine practice. For instance, the personalized diagnosis and treatment planning for a single cancer patient relies on the various images (e.g., radiological, pathological, and camera images) and non-image data (e.g., clinical data and genomic data). However, such decision-making procedures can be subjective, qualitative, and have large inter-subject variabilities. With the recent advances in multi-modal deep learning technologies, an increasingly large number of efforts have been devoted to a key question: how do we extract and aggregate multi-modal information to ultimately provide more objective, quantitative computer-aided clinical decision making? This paper reviews the recent studies on dealing with such a question. Briefly, this review will include the (1) overview of current multi-modal learning workflows, (2) summarization of multi-modal fusion methods, (3) discussion of the performance, (4) applications in disease diagnosis and prognosis, and (5) challenges and future directions.
    TraHGR: Transformer for Hand Gesture Recognition via ElectroMyography. (arXiv:2203.16336v2 [eess.SP] UPDATED)
    Deep learning-based Hand Gesture Recognition (HGR) via surface Electromyogram (sEMG) signals has recently shown significant potential for development of advanced myoelectric-controlled prosthesis. Existing deep learning approaches, typically, include only one model as such can hardly maintain acceptable generalization performance in changing scenarios. In this paper, we aim to address this challenge by capitalizing on the recent advances of hybrid models and transformers. In other words, we propose a hybrid framework based on the transformer architecture, which is a relatively new and revolutionizing deep learning model. The proposed hybrid architecture, referred to as the Transformer for Hand Gesture Recognition (TraHGR), consists of two parallel paths followed by a linear layer that acts as a fusion center to integrate the advantage of each module and provide robustness over different scenarios. We evaluated the proposed architecture TraHGR based on the commonly used second Ninapro dataset, referred to as the DB2. The sEMG signals in the DB2 dataset are measured in the real-life conditions from 40 healthy users, each performing 49 gestures. We have conducted extensive set of experiments to test and validate the proposed TraHGR architecture, and have compared its achievable accuracy with more than five recently proposed HGR classification algorithms over the same dataset. We have also compared the results of the proposed TraHGR architecture with each individual path and demonstrated the distinguishing power of the proposed hybrid architecture. The recognition accuracies of the proposed TraHGR architecture are 86.18%, 88.91%, 81.44%, and 93.84%, which are 2.48%, 5.12%, 8.82%, and 4.30% higher than the state-ofthe-art performance for DB2 (49 gestures), DB2-B (17 gestures), DB2-C (23 gestures), and DB2-D (9 gestures), respectively.
    D-Grasp: Physically Plausible Dynamic Grasp Synthesis for Hand-Object Interactions. (arXiv:2112.03028v2 [cs.RO] CROSS LISTED)
    We introduce the dynamic grasp synthesis task: given an object with a known 6D pose and a grasp reference, our goal is to generate motions that move the object to a target 6D pose. This is challenging, because it requires reasoning about the complex articulation of the human hand and the intricate physical interaction with the object. We propose a novel method that frames this problem in the reinforcement learning framework and leverages a physics simulation, both to learn and to evaluate such dynamic interactions. A hierarchical approach decomposes the task into low-level grasping and high-level motion synthesis. It can be used to generate novel hand sequences that approach, grasp, and move an object to a desired location, while retaining human-likeness. We show that our approach leads to stable grasps and generates a wide range of motions. Furthermore, even imperfect labels can be corrected by our method to generate dynamic interaction sequences.
    Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment. (arXiv:2203.15937v1 [eess.AS] CROSS LISTED)
    Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels gains a 5.35% phoneme error rate reduction and 2.48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution achieves a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility.
    UNICON: Combating Label Noise Through Uniform Selection and Contrastive Learning. (arXiv:2203.14542v2 [cs.CV] UPDATED)
    Supervised deep learning methods require a large repository of annotated data; hence, label noise is inevitable. Training with such noisy data negatively impacts the generalization performance of deep neural networks. To combat label noise, recent state-of-the-art methods employ some sort of sample selection mechanism to select a possibly clean subset of data. Next, an off-the-shelf semi-supervised learning method is used for training where rejected samples are treated as unlabeled data. Our comprehensive analysis shows that current selection methods disproportionately select samples from easy (fast learnable) classes while rejecting those from relatively harder ones. This creates class imbalance in the selected clean set and in turn, deteriorates performance under high label noise. In this work, we propose UNICON, a simple yet effective sample selection method which is robust to high label noise. To address the disproportionate selection of easy and hard samples, we introduce a Jensen-Shannon divergence based uniform selection mechanism which does not require any probabilistic modeling and hyperparameter tuning. We complement our selection method with contrastive learning to further combat the memorization of noisy labels. Extensive experimentation on multiple benchmark datasets demonstrates the effectiveness of UNICON; we obtain an 11.4% improvement over the current state-of-the-art on CIFAR100 dataset with a 90% noise rate. Our code is publicly available
    Well-classified Examples are Underestimated in Classification with Deep Neural Networks. (arXiv:2110.06537v5 [cs.LG] UPDATED)
    The conventional wisdom behind learning deep classification models is to focus on bad-classified examples and ignore well-classified examples that are far from the decision boundary. For instance, when training with cross-entropy loss, examples with higher likelihoods (i.e., well-classified examples) contribute smaller gradients in back-propagation. However, we theoretically show that this common practice hinders representation learning, energy optimization, and margin growth. To counteract this deficiency, we propose to reward well-classified examples with additive bonuses to revive their contribution to the learning process. This counterexample theoretically addresses these three issues. We empirically support this claim by directly verifying the theoretical results or significant performance improvement with our counterexample on diverse tasks, including image classification, graph classification, and machine translation. Furthermore, this paper shows that we can deal with complex scenarios, such as imbalanced classification, OOD detection, and applications under adversarial attacks because our idea can solve these three issues. Code is available at: https://github.com/lancopku/well-classified-examples-are-underestimated.
    Omnivore: A Single Model for Many Visual Modalities. (arXiv:2201.08377v2 [cs.CV] UPDATED)
    Prior work has studied different visual modalities in isolation and developed separate architectures for recognition of images, videos, and 3D data. Instead, in this paper, we propose a single model which excels at classifying images, videos, and single-view 3D data using exactly the same model parameters. Our 'Omnivore' model leverages the flexibility of transformer-based architectures and is trained jointly on classification tasks from different modalities. Omnivore is simple to train, uses off-the-shelf standard datasets, and performs at-par or better than modality-specific models of the same size. A single Omnivore model obtains 86.0% on ImageNet, 84.1% on Kinetics, and 67.1% on SUN RGB-D. After finetuning, our models outperform prior work on a variety of vision tasks and generalize across modalities. Omnivore's shared visual representation naturally enables cross-modal recognition without access to correspondences between modalities. We hope our results motivate researchers to model visual modalities together.
    Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios. (arXiv:2203.09767v2 [cs.SD] UPDATED)
    Overlapping speech diarization has been traditionally treated as a multi-label classification problem. In this paper, we reformulate this task as a single-label prediction problem by encoding multiple binary labels into a single label with the power set, which represents the possible combinations of target speakers. This formulation has two benefits. First, the overlaps of target speakers are explicitly modeled. Second, threshold selection is no longer needed. Through this formulation, we propose the speaker embedding-aware neural diarization (SEND) framework, where a speech encoder, a speaker encoder, two similarity scorers, and a post-processing network are jointly optimized to predict the encoded labels according to the similarities between speech features and speaker embeddings. Experimental results show that SEND has a stable learning process and can be trained on highly overlapped data without extra initialization. More importantly, our method achieves the state-of-the-art performance in real meeting scenarios with fewer model parameters and lower computational complexity.
    Forecasting from LiDAR via Future Object Detection. (arXiv:2203.16297v2 [cs.CV] UPDATED)
    Object detection and forecasting are fundamental components of embodied perception. These two problems, however, are largely studied in isolation by the community. In this paper, we propose an end-to-end approach for detection and motion forecasting based on raw sensor measurement as opposed to ground truth tracks. Instead of predicting the current frame locations and forecasting forward in time, we directly predict future object locations and backcast to determine where each trajectory began. Our approach not only improves overall accuracy compared to other modular or end-to-end baselines, it also prompts us to rethink the role of explicit tracking for embodied perception. Additionally, by linking future and current locations in a many-to-one manner, our approach is able to reason about multiple futures, a capability that was previously considered difficult for end-to-end approaches. We conduct extensive experiments on the popular nuScenes dataset and demonstrate the empirical effectiveness of our approach. In addition, we investigate the appropriateness of reusing standard forecasting metrics for an end-to-end setup, and find a number of limitations which allow us to build simple baselines to game these metrics. We address this issue with a novel set of joint forecasting and detection metrics that extend the commonly used AP metrics from the detection community to measuring forecasting accuracy. Our code is available at https://github.com/neeharperi/FutureDet
    Graph Neural Networks in IoT: A Survey. (arXiv:2203.15935v2 [cs.LG] UPDATED)
    The Internet of Things (IoT) boom has revolutionized almost every corner of people's daily lives: healthcare, home, transportation, manufacturing, supply chain, and so on. With the recent development of sensor and communication technologies, IoT devices including smart wearables, cameras, smartwatches, and autonomous vehicles can accurately measure and perceive their surrounding environment. Continuous sensing generates massive amounts of data and presents challenges for machine learning. Deep learning models (e.g., convolution neural networks and recurrent neural networks) have been extensively employed in solving IoT tasks by learning patterns from multi-modal sensory data. Graph Neural Networks (GNNs), an emerging and fast-growing family of neural network models, can capture complex interactions within sensor topology and have been demonstrated to achieve state-of-the-art results in numerous IoT learning tasks. In this survey, we present a comprehensive review of recent advances in the application of GNNs to the IoT field, including a deep dive analysis of GNN design in various IoT sensing environments, an overarching list of public data and source code from the collected publications, and future research directions. To keep track of newly published works, we collect representative papers and their open-source implementations and create a Github repository at https://github.com/GuiminDong/GNN4IoT.
    Longitudinal Fairness with Censorship. (arXiv:2203.16024v2 [cs.LG] UPDATED)
    Recent works in artificial intelligence fairness attempt to mitigate discrimination by proposing constrained optimization programs that achieve parity for some fairness statistic. Most assume availability of the class label, which is impractical in many real-world applications such as precision medicine, actuarial analysis and recidivism prediction. Here we consider fairness in longitudinal right-censored environments, where the time to event might be unknown, resulting in censorship of the class label and inapplicability of existing fairness studies. We devise applicable fairness measures, propose a debiasing algorithm, and provide necessary theoretical constructs to bridge fairness with and without censorship for these important and socially-sensitive tasks. Our experiments on four censored datasets confirm the utility of our approach.
    An Evaluation Dataset for Legal Word Embedding: A Case Study On Chinese Codex. (arXiv:2203.15173v1 [cs.CL] CROSS LISTED)
    Word embedding is a modern distributed word representations approach widely used in many natural language processing tasks. Converting the vocabulary in a legal document into a word embedding model facilitates subjecting legal documents to machine learning, deep learning, and other algorithms and subsequently performing the downstream tasks of natural language processing vis-\`a-vis, for instance, document classification, contract review, and machine translation. The most common and practical approach of accuracy evaluation with the word embedding model uses a benchmark set with linguistic rules or the relationship between words to perform analogy reasoning via algebraic calculation. This paper proposes establishing a 1,134 Legal Analogical Reasoning Questions Set (LARQS) from the 2,388 Chinese Codex corpus using five kinds of legal relations, which are then used to evaluate the accuracy of the Chinese word embedding model. Moreover, we discovered that legal relations might be ubiquitous in the word embedding model.
    Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization. (arXiv:2107.12438v3 [math.OC] UPDATED)
    Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.
    STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity. (arXiv:2203.09611v2 [cs.LG] UPDATED)
    Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc.
    Image Compression and Actionable Intelligence With Deep Neural Networks. (arXiv:2203.13686v2 [cs.LG] UPDATED)
    If a unit cannot receive intelligence from a source due to external factors, we consider them disadvantaged users. We categorize this as a preoccupied unit working on a low connectivity device on the edge. This case requires that we use a different approach to deliver intelligence, particularly satellite imagery information, than normally employed. To address this, we propose a survey of information reduction techniques to deliver the information from a satellite image in a smaller package. We investigate four techniques to aid in the reduction of delivered information: traditional image compression, neural network image compression, object detection image cutout, and image to caption. Each of these mechanisms have their benefits and tradeoffs when considered for a disadvantaged user.
    Continual Learning for Unsupervised Anomaly Detection in Continuous Auditing of Financial Accounting Data. (arXiv:2112.13215v2 [cs.LG] UPDATED)
    International audit standards require the direct assessment of a financial statement's underlying accounting journal entries. Driven by advances in artificial intelligence, deep-learning inspired audit techniques emerged to examine vast quantities of journal entry data. However, in regular audits, most of the proposed methods are applied to learn from a comparably stationary journal entry population, e.g., of a financial quarter or year. Ignoring situations where audit relevant distribution changes are not evident in the training data or become incrementally available over time. In contrast, in continuous auditing, deep-learning models are continually trained on a stream of recorded journal entries, e.g., of the last hour. Resulting in situations where previous knowledge interferes with new information and will be entirely overwritten. This work proposes a continual anomaly detection framework to overcome both challenges and designed to learn from a stream of journal entry data experiences. The framework is evaluated based on deliberately designed audit scenarios and two real-world datasets. Our experimental results provide initial evidence that such a learning scheme offers the ability to reduce false-positive alerts and false-negative decisions.
    Radial Autoencoders for Enhanced Anomaly Detection. (arXiv:2203.15884v2 [cs.LG] UPDATED)
    In classification problems, supervised machine-learning methods outperform traditional algorithms, thanks to the ability of neural networks to learn complex patterns. However, in two-class classification tasks like anomaly or fraud detection, unsupervised methods could do even better, because their prediction is not limited to previously learned types of anomalies. An intuitive approach of anomaly detection can be based on the distances from the centers of mass of the two respective classes. Autoencoders, although trained without supervision, can also detect anomalies: considering the center of mass of the normal points, reconstructions have now radii, with largest radii most likely indicating anomalous points. Of course, radii-based classification were already possible without interposing an autoencoder. In any space, radial classification can be operated, to some extent. In order to outperform it, we proceed to radial deformations of data (i.e. centric compression or expansions of axes) and autoencoder training. Any autoencoder that makes use of a data center is here baptized a centric autoencoder (cAE). A special type is the cAE trained with a uniformly compressed dataset, named the centripetal autoencoder (cpAE). The new concept is studied here in relation with a schematic artificial dataset, and the derived methods show consistent score improvements. But tested on real banking data, our radial deformation supervised algorithms alone still perform better that cAEs, as expected from most supervised methods; nonetheless, in hybrid approaches, cAEs can be combined with a radial deformation of space, improving its classification score. We expect that centric autoencoders will become irreplaceable objects in anomaly live detection based on geometry, thanks to their ability to stem naturally on geometrical algorithms and to their native capability of detecting unknown anomaly types.
    Deep Reinforcement Learning for Resource Constrained Multiclass Scheduling in Wireless Networks. (arXiv:2011.13634v3 [cs.LG] UPDATED)
    The problem of resource constrained scheduling in a dynamic and heterogeneous wireless setting is considered here. In our setup, the available limited bandwidth resources are allocated in order to serve randomly arriving service demands, which in turn belong to different classes in terms of payload data requirement, delay tolerance, and importance/priority. In addition to heterogeneous traffic, another major challenge stems from random service rates due to time-varying wireless communication channels. Various approaches for scheduling and resource allocation can be used, ranging from simple greedy heuristics and constrained optimization to combinatorics. Those methods are tailored to specific network or application configuration and are usually suboptimal. To this purpose, we resort to deep reinforcement learning (DRL) and propose a distributional Deep Deterministic Policy Gradient (DDPG) algorithm combined with Deep Sets to tackle the aforementioned problem. Furthermore, we present a novel way to use a Dueling Network, which leads to further performance improvement. Our proposed algorithm is tested on both synthetic and real data, showing consistent gains against state-of-the-art conventional methods from combinatorics, optimization, and scheduling metrics.
    ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism. (arXiv:2203.15547v3 [cs.CV] UPDATED)
    Convolutional Neural Networks need the construction of informative features, which are determined by channel-wise and spatial-wise information at the network's layers. In this research, we focus on bringing in a novel solution that uses sophisticated optimization for enhancing both the spatial and channel components inside each layer's receptive field. Capsule Networks were used to understand the spatial association between features in the feature map. Standalone capsule networks have shown good results on comparatively simple datasets than on complex datasets as a result of the inordinate amount of feature information. Thus, to tackle this issue, we have proposed ME-CapsNet by introducing deeper convolutional layers to extract important features before passing through modules of capsule layers strategically to improve the performance of the network significantly. The deeper convolutional layer includes blocks of Squeeze-Excitation networks which use a stochastic sampling approach for progressively reducing the spatial size thereby dynamically recalibrating the channels by reconstructing their interdependencies without much loss of important feature information. Extensive experimentation was done using commonly used datasets demonstrating the efficiency of the proposed ME-CapsNet, which clearly outperforms various research works by achieving higher accuracy with minimal model complexity in complex datasets.
    BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning. (arXiv:2203.01522v2 [cs.CV] CROSS LISTED)
    Despite the success of deep neural networks, there are still many challenges in deep representation learning due to the data scarcity issues such as data imbalance, unseen distribution, and domain shift. To address the above-mentioned issues, a variety of methods have been devised to explore the sample relationships in a vanilla way (i.e., from the perspectives of either the input or the loss function), failing to explore the internal structure of deep neural networks for learning with sample relationships. Inspired by this, we propose to enable deep neural networks themselves with the ability to learn the sample relationships from each mini-batch. Specifically, we introduce a batch transformer module or BatchFormer, which is then applied into the batch dimension of each mini-batch to implicitly explore sample relationships during training. By doing this, the proposed method enables the collaboration of different samples, e.g., the head-class samples can also contribute to the learning of the tail classes for long-tailed recognition. Furthermore, to mitigate the gap between training and testing, we share the classifier between with or without the BatchFormer during training, which can thus be removed during testing. We perform extensive experiments on over ten datasets and the proposed method achieves significant improvements on different data scarcity applications without any bells and whistles, including the tasks of long-tailed recognition, compositional zero-shot learning, domain generalization, and contrastive learning. Code will be made publicly available at https://github.com/zhihou7/BatchFormer.
    Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis. (arXiv:2203.17263v1 [cs.CV])
    Since facial actions such as lip movements contain significant information about speech content, it is not surprising that audio-visual speech enhancement methods are more accurate than their audio-only counterparts. Yet, state-of-the-art approaches still struggle to generate clean, realistic speech without noise artifacts and unnatural distortions in challenging acoustic environments. In this paper, we propose a novel audio-visual speech enhancement framework for high-fidelity telecommunications in AR/VR. Our approach leverages audio-visual speech cues to generate the codes of a neural speech codec, enabling efficient synthesis of clean, realistic speech from noisy signals. Given the importance of speaker-specific cues in speech, we focus on developing personalized models that work well for individual speakers. We demonstrate the efficacy of our approach on a new audio-visual speech dataset collected in an unconstrained, large vocabulary setting, as well as existing audio-visual datasets, outperforming speech enhancement baselines on both quantitative metrics and human evaluation studies. Please see the supplemental video for qualitative results at https://github.com/facebookresearch/facestar/releases/download/paper_materials/video.mp4.
    CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition. (arXiv:2203.17023v1 [cs.SD])
    Previous research has looked into ways to improve speech emotion recognition (SER) by utilizing both acoustic and linguistic cues of speech. However, the potential association between state-of-the-art ASR models and the SER task has yet to be investigated. In this paper, we propose a novel channel and temporal-wise attention RNN (CTA-RNN) architecture based on the intermediate representations of pre-trained ASR models. Specifically, the embeddings of a large-scale pre-trained end-to-end ASR encoder contain both acoustic and linguistic information, as well as the ability to generalize to different speakers, making them well suited for downstream SER task. To further exploit the embeddings from different layers of the ASR encoder, we propose a novel CTA-RNN architecture to capture the emotional salient parts of embeddings in both the channel and temporal directions. We evaluate our approach on two popular benchmark datasets, IEMOCAP and MSP-IMPROV, using both within-corpus and cross-corpus settings. Experimental results show that our proposed method can achieve excellent performance in terms of accuracy and robustness.
    Using IPA-Based Tacotron for Data Efficient Cross-Lingual Speaker Adaptation and Pronunciation Enhancement. (arXiv:2011.06392v2 [cs.SD] UPDATED)
    Recent neural Text-to-Speech (TTS) models have been shown to perform very well when enough data is available. However, fine-tuning them for new speakers or languages is not straightforward in a low-resource setup. In this paper, we show that by applying minor modifications to a Tacotron model, one can transfer an existing TTS model for new speakers from the same or a different language using only 20 minutes of data. For this purpose, we first introduce a base multi-lingual Tacotron with language-agnostic input, then demonstrate how transfer learning is done for different scenarios of speaker adaptation without exploiting any pre-trained speaker encoder or code-switching technique. We evaluate the transferred model in both subjective and objective ways.
    The paradox of the compositionality of natural language: a neural machine translation case study. (arXiv:2108.05885v2 [cs.CL] UPDATED)
    Obtaining human-like performance in NLP is often argued to require compositional generalisation. Whether neural networks exhibit this ability is usually studied by training models on highly compositional synthetic data. However, compositionality in natural language is much more complex than the rigid, arithmetic-like version such data adheres to, and artificial compositionality tests thus do not allow us to determine how neural models deal with more realistic forms of compositionality. In this work, we re-instantiate three compositionality tests from the literature and reformulate them for neural machine translation (NMT). Our results highlight that: i) unfavourably, models trained on more data are more compositional; ii) models are sometimes less compositional than expected, but sometimes more, exemplifying that different levels of compositionality are required, and models are not always able to modulate between them correctly; iii) some of the non-compositional behaviours are mistakes, whereas others reflect the natural variation in data. Apart from an empirical study, our work is a call to action: we should rethink the evaluation of compositionality in neural networks and develop benchmarks using real data to evaluate compositionality on natural language, where composing meaning is not as straightforward as doing the math.
    Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). (arXiv:2203.13366v2 [cs.IR] UPDATED)
    For a long period, different recommendation tasks typically require designing task-specific architectures and training objectives. As a result, it is hard to transfer the learned knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches, e.g., a sequential recommendation model can hardly be applied or transferred to a review generation method. To deal with such issues, considering that language grounding is a powerful medium to describe and represent various problems or tasks, we present a flexible and unified text-to-text paradigm called "Pretrain, Personalized Prompt, and Predict Paradigm" (P5) for recommendation, which unifies various recommendation tasks in a shared framework. In P5, all data such as user-item interactions, item metadata, and user reviews are converted to a common format -- natural language sequences. The rich information from natural language assist P5 to capture deeper semantics for recommendation. P5 learns different tasks with the same language modeling objective during pretraining. Thus, it possesses the potential to serve as the foundation model for downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation, which will revolutionize the technical form of recommender system towards universal recommendation engine. With adaptive personalized prompt for different users, P5 is able to make predictions in a zero-shot or few-shot manner and largely reduces the necessity for extensive fine-tuning. On several recommendation benchmarks, we conduct experiments to show the effectiveness of our generative approach. We will release our prompts and pretrained P5 language model to help advance future research on Recommendation as Language Processing (RLP) and Personalized Foundation Models.
    DiGS : Divergence guided shape implicit neural representation for unoriented point clouds. (arXiv:2106.10811v2 [cs.CV] UPDATED)
    Shape implicit neural representations (INRs) have recently shown to be effective in shape analysis and reconstruction tasks. Existing INRs require point coordinates to learn the implicit level sets of the shape. When a normal vector is available for each point, a higher fidelity representation can be learned, however normal vectors are often not provided as raw data. Furthermore, the method's initialization has been shown to play a crucial role for surface reconstruction. In this paper, we propose a divergence guided shape representation learning approach that does not require normal vectors as input. We show that incorporating a soft constraint on the divergence of the distance function favours smooth solutions that reliably orients gradients to match the unknown normal at each point, in some cases even better than approaches that use ground truth normal vectors directly. Additionally, we introduce a novel geometric initialization method for sinusoidal INRs that further improves convergence to the desired solution. We evaluate the effectiveness of our approach on the task of surface reconstruction and shape space learning and show SOTA performance compared to other unoriented methods. Code and model parameters available at our project page https://chumbyte.github.io/DiGS-Site/.
    Schema matching using Gaussian mixture models with Wasserstein distance. (arXiv:2111.14244v2 [cs.LG] UPDATED)
    Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.
    DiffSkill: Skill Abstraction from Differentiable Physics for Deformable Object Manipulations with Tools. (arXiv:2203.17275v1 [cs.LG])
    We consider the problem of sequential robotic manipulation of deformable objects using tools. Previous works have shown that differentiable physics simulators provide gradients to the environment state and help trajectory optimization to converge orders of magnitude faster than model-free reinforcement learning algorithms for deformable object manipulation. However, such gradient-based trajectory optimization typically requires access to the full simulator states and can only solve short-horizon, single-skill tasks due to local optima. In this work, we propose a novel framework, named DiffSkill, that uses a differentiable physics simulator for skill abstraction to solve long-horizon deformable object manipulation tasks from sensory observations. In particular, we first obtain short-horizon skills using individual tools from a gradient-based optimizer, using the full state information in a differentiable simulator; we then learn a neural skill abstractor from the demonstration trajectories which takes RGBD images as input. Finally, we plan over the skills by finding the intermediate goals and then solve long-horizon tasks. We show the advantages of our method in a new set of sequential deformable object manipulation tasks compared to previous reinforcement learning algorithms and compared to the trajectory optimizer.
    A statistical framework for efficient out of distribution detection in deep neural networks. (arXiv:2102.12967v3 [cs.LG] UPDATED)
    Background. Commonly, Deep Neural Networks (DNNs) generalize well on samples drawn from a distribution similar to that of the training set. However, DNNs' predictions are brittle and unreliable when the test samples are drawn from a dissimilar distribution. This is a major concern for deployment in real-world applications, where such behavior may come at a considerable cost, such as industrial production lines, autonomous vehicles, or healthcare applications. Contributions. We frame Out Of Distribution (OOD) detection in DNNs as a statistical hypothesis testing problem. Tests generated within our proposed framework combine evidence from the entire network. Unlike previous OOD detection heuristics, this framework returns a $p$-value for each test sample. It is guaranteed to maintain the Type I Error (T1E - incorrectly predicting OOD for an actual in-distribution sample) for test data. Moreover, this allows to combine several detectors while maintaining the T1E. Building on this framework, we suggest a novel OOD procedure based on low-order statistics. Our method achieves comparable or better results than state-of-the-art methods on well-accepted OOD benchmarks, without retraining the network parameters or assuming prior knowledge on the test distribution -- and at a fraction of the computational cost.
    Weighted Programming. (arXiv:2202.07577v2 [cs.PL] UPDATED)
    We study weighted programming, a programming paradigm for specifying mathematical models. More specifically, the weighted programs we investigate are like usual imperative programs with two additional features: (1) nondeterministic branching and (2) weighting execution traces. Weights can be numbers but also other objects like words from an alphabet, polynomials, formal power series, or cardinal numbers. We argue that weighted programming as a paradigm can be used to specify mathematical models beyond probability distributions (as is done in probabilistic programming). We develop weakest-precondition- and weakest-liberal-precondition-style calculi \`{a} la Dijkstra for reasoning about mathematical models specified by weighted programs. We present several case studies. For instance, we use weighted programming to model the ski rental problem - an optimization problem. We model not only the optimization problem itself, but also the best deterministic online algorithm for solving this problem as weighted programs. By means of weakest-precondition-style reasoning, we can determine the competitive ratio of the online algorithm on source code level.
    Hypergraph Convolutional Networks via Equivalency between Hypergraphs and Undirected Graphs. (arXiv:2203.16939v1 [cs.LG])
    As a powerful tool for modeling complex relationships, hypergraphs are gaining popularity from the graph learning community. However, commonly used frameworks in deep hypergraph learning focus on hypergraphs with \textit{edge-independent vertex weights}(EIVWs), without considering hypergraphs with \textit{edge-dependent vertex weights} (EDVWs) that have more modeling power. To compensate for this, in this paper, we present General Hypergraph Spectral Convolution(GHSC), a general learning framework that not only can handle EDVW and EIVW hypergraphs, but more importantly, enables theoretically explicitly utilizing the existing powerful Graph Convolutional Neural Networks (GCNNs) such that largely ease the design of Hypergraph Neural Networks. In this framework, the graph Laplacian of the given undirected GCNNs is replaced with a unified hypergraph Laplacian that incorporates vertex weight information from a random walk perspective by equating our defined generalized hypergraphs with simple undirected graphs. Extensive experiments from various domains including social network analysis, visual objective classification, protein learning demonstrate that the proposed framework can achieve state-of-the-art performance.
    Model Agnostic Defence against Backdoor Attacks in Machine Learning. (arXiv:1908.02203v3 [cs.LG] UPDATED)
    Machine Learning (ML) has automated a multitude of our day-to-day decision making domains such as education, employment and driving automation. The continued success of ML largely depends on our ability to trust the model we are using. Recently, a new class of attacks called Backdoor Attacks have been developed. These attacks undermine the user's trust in ML models. In this work, we present NEO, a model agnostic framework to detect and mitigate such backdoor attacks in image classification ML models. For a given image classification model, our approach analyses the inputs it receives and determines if the model is backdoored. In addition to this feature, we also mitigate these attacks by determining the correct predictions of the poisoned images. An appealing feature of NEO is that it can, for the first time, isolate and reconstruct the backdoor trigger. NEO is also the first defence methodology, to the best of our knowledge that is completely blackbox. We have implemented NEO and evaluated it against three state of the art poisoned models. These models include highly critical applications such as traffic sign detection (USTS) and facial detection. In our evaluation, we show that NEO can detect $\approx$88% of the poisoned inputs on average and it is as fast as 4.4 ms per input image. We also reconstruct the poisoned input for the user to effectively test their systems.
    How Does Pre-trained Wav2Vec2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications. (arXiv:2203.16822v1 [eess.AS])
    Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine-tuned on downstream tasks e.g., automatic speech recognition (ASR). Yet, few works investigated the impact on performance when the data substantially differs between the pre-training and downstream fine-tuning phases (i.e., domain shift). We target this scenario by analyzing the robustness of Wav2Vec2.0 and XLS-R models on downstream ASR for a completely unseen domain, i.e., air traffic control (ATC) communications. We benchmark the proposed models on four challenging ATC test sets (signal-to-noise ratio varies between 5 to 20 dB). Relative word error rate (WER) reduction between 20% to 40% are obtained in comparison to hybrid-based state-of-the-art ASR baselines by fine-tuning E2E acoustic models with a small fraction of labeled data. We also study the impact of fine-tuning data size on WERs, going from 5 minutes (few-shot) to 15 hours.
    Two Coupled Rejection Metrics Can Tell Adversarial Examples Apart. (arXiv:2105.14785v4 [cs.LG] UPDATED)
    Correctly classifying adversarial examples is an essential but challenging requirement for safely deploying machine learning models. As reported in RobustBench, even the state-of-the-art adversarially trained models struggle to exceed 67% robust test accuracy on CIFAR-10, which is far from practical. A complementary way towards robustness is to introduce a rejection option, allowing the model to not return predictions on uncertain inputs, where confidence is a commonly used certainty proxy. Along with this routine, we find that confidence and a rectified confidence (R-Con) can form two coupled rejection metrics, which could provably distinguish wrongly classified inputs from correctly classified ones. This intriguing property sheds light on using coupling strategies to better detect and reject adversarial examples. We evaluate our rectified rejection (RR) module on CIFAR-10, CIFAR-10-C, and CIFAR-100 under several attacks including adaptive ones, and demonstrate that the RR module is compatible with different adversarial training frameworks on improving robustness, with little extra computation. The code is available at https://github.com/P2333/Rectified-Rejection.
    A Single-Timescale Method for Stochastic Bilevel Optimization. (arXiv:2102.04671v4 [math.OC] UPDATED)
    Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
    DeepFry: Identifying Vocal Fry Using Deep Neural Networks. (arXiv:2203.17019v1 [eess.AS])
    Vocal fry or creaky voice refers to a voice quality characterized by irregular glottal opening and low pitch. It occurs in diverse languages and is prevalent in American English, where it is used not only to mark phrase finality, but also sociolinguistic factors and affect. Due to its irregular periodicity, creaky voice challenges automatic speech processing and recognition systems, particularly for languages where creak is frequently used. This paper proposes a deep learning model to detect creaky voice in fluent speech. The model is composed of an encoder and a classifier trained together. The encoder takes the raw waveform and learns a representation using a convolutional neural network. The classifier is implemented as a multi-headed fully-connected network trained to detect creaky voice, voicing, and pitch, where the last two are used to refine creak prediction. The model is trained and tested on speech of American English speakers, annotated for creak by trained phoneticians. We evaluated the performance of our system using two encoders: one is tailored for the task, and the other is based on a state-of-the-art unsupervised representation. Results suggest our best-performing system has improved recall and F1 scores compared to previous methods on unseen data.
    Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain. (arXiv:2203.17004v1 [eess.AS])
    Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations, thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using SGMs for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.
    Generation and Simulation of Synthetic Datasets with Copulas. (arXiv:2203.17250v1 [cs.LG])
    This paper proposes a new method to generate synthetic data sets based on copula models. Our goal is to produce surrogate data resembling real data in terms of marginal and joint distributions. We present a complete and reliable algorithm for generating a synthetic data set comprising numeric or categorical variables. Applying our methodology to two datasets shows better performance compared to other methods such as SMOTE and autoencoders.
    Continuous Scene Representations for Embodied AI. (arXiv:2203.17251v1 [cs.CV])
    We propose Continuous Scene Representations (CSR), a scene representation constructed by an embodied agent navigating within a space, where objects and their relationships are modeled by continuous valued embeddings. Our method captures feature relationships between objects, composes them into a graph structure on-the-fly, and situates an embodied agent within the representation. Our key insight is to embed pair-wise relationships between objects in a latent space. This allows for a richer representation compared to discrete relations (e.g., [support], [next-to]) commonly used for building scene representations. CSR can track objects as the agent moves in a scene, update the representation accordingly, and detect changes in room configurations. Using CSR, we outperform state-of-the-art approaches for the challenging downstream task of visual room rearrangement, without any task specific training. Moreover, we show the learned embeddings capture salient spatial details of the scene and show applicability to real world data. A summery video and code is available at https://prior.allenai.org/projects/csr.
    LEAD1.0: A Large-scale Annotated Dataset for Energy Anomaly Detection in Commercial Buildings. (arXiv:2203.17256v1 [cs.LG])
    Modern buildings are densely equipped with smart energy meters, which periodically generate a massive amount of time-series data yielding few million data points every day. This data can be leveraged to discover the underlying loads, infer their energy consumption patterns, inter-dependencies on environmental factors, and the building's operational properties. Furthermore, it allows us to simultaneously identify anomalies present in the electricity consumption profiles, which is a big step towards saving energy and achieving global sustainability. However, to date, the lack of large-scale annotated energy consumption datasets hinders the ongoing research in anomaly detection. We contribute to this effort by releasing a well-annotated version of a publicly available ASHRAE Great Energy Predictor III data set containing 1,413 smart electricity meter time series spanning over one year. In addition, we benchmark the performance of eight state-of-the-art anomaly detection methods on our dataset and compare their performance.
    Learning from many trajectories. (arXiv:2203.17193v1 [cs.LG])
    We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Our conditions for efficient learning generalize the former setting--trajectories must be non-degenerate in ways that extend standard requirements for independent examples. They do not require that trajectories be ergodic, long, nor strictly stable. For linear least-squares regression, given $n$-dimensional examples produced by $m$ trajectories, each of length $T$, we observe a notable change in statistical efficiency as the number of trajectories increases from a few (namely $m \lesssim n$) to many (namely $m \gtrsim n$). Specifically, we establish that the worst-case error rate this problem is $\Theta(n / m T)$ whenever $m \gtrsim n$. Meanwhile, when $m \lesssim n$, we establish a (sharp) lower bound of $\Omega(n^2 / m^2 T)$ on the worst-case error rate, realized by a simple, marginally unstable linear dynamical system. A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent altogether, drawn from their marginals. As a corollary of our analysis, we also improve guarantees for the linear system identification problem.
    Training strategy for a lightweight countermeasure model for automatic speaker verification. (arXiv:2203.17031v1 [cs.SD])
    The countermeasure (CM) model is developed to protect Automatic Speaker Verification (ASV) systems from spoof attacks and prevent resulting personal information leakage. Based on practicality and security considerations, the CM model is usually deployed on edge devices, which have more limited computing resources and storage space than cloud- based systems. This work proposes training strategies for a lightweight CM model for ASV, using generalized end- to-end (GE2E) pre-training and adversarial fine-tuning to improve performance, and applying knowledge distillation (KD) to reduce the size of the CM model. In the evalua- tion phase of the ASVspoof 2021 Logical Access task, the lightweight ResNetSE model reaches min t-DCF 0.2695 and EER 3.54%. Compared to the teacher model, the lightweight student model only uses 22.5% of parameters and 21.1% of multiply and accumulate operands of the teacher model.
    CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers. (arXiv:2203.17196v1 [cs.SE])
    Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic CATegorizer of ISSue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug reports, Enhancement/feature requests, and Questions. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss, and evaluating the model are publicly available.
    DINE: Domain Adaptation from Single and Multiple Black-box Predictors. (arXiv:2104.01539v3 [cs.CV] UPDATED)
    To ease the burden of labeling, unsupervised domain adaptation (UDA) aims to transfer knowledge in previous and related labeled datasets (sources) to a new unlabeled dataset (target). Despite impressive progress, prior methods always need to access the raw source data and develop data-dependent alignment approaches to recognize the target samples in a transductive learning manner, which may raise privacy concerns from source individuals. Several recent studies resort to an alternative solution by exploiting the well-trained white-box model from the source domain, yet, it may still leak the raw data through generative adversarial learning. This paper studies a practical and interesting setting for UDA, where only black-box source models (i.e., only network predictions are available) are provided during adaptation in the target domain. To solve this problem, we propose a new two-step knowledge adaptation framework called DIstill and fine-tuNE (DINE). Taking into consideration the target data structure, DINE first distills the knowledge from the source predictor to a customized target model, then fine-tunes the distilled model to further fit the target domain. Besides, neural networks are not required to be identical across domains in DINE, even allowing effective adaptation on a low-resource device. Empirical results on three UDA scenarios (i.e., single-source, multi-source, and partial-set) confirm that DINE achieves highly competitive performance compared to state-of-the-art data-dependent approaches. Code is available at \url{https://github.com/tim-learn/DINE/}.
    Neural Q-learning for solving elliptic PDEs. (arXiv:2203.17128v1 [math.NA])
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units $\rightarrow \infty$. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in $L^2$ to the PDE solution as the training time $\rightarrow \infty$. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs.
    Ternary and Binary Quantization for Improved Classification. (arXiv:2203.16798v1 [cs.CV])
    Dimension reduction and data quantization are two important methods for reducing data complexity. In the paper, we study the methodology of first reducing data dimension by random projection and then quantizing the projections to ternary or binary codes, which has been widely applied in classification. Usually, the quantization will seriously degrade the accuracy of classification due to high quantization errors. Interestingly, however, we observe that the quantization could provide comparable and often superior accuracy, as the data to be quantized are sparse features generated with common filters. Furthermore, this quantization property could be maintained in the random projections of sparse features, if both the features and random projection matrices are sufficiently sparse. By conducting extensive experiments, we validate and analyze this intriguing property.
    Performative Power. (arXiv:2203.17232v1 [cs.LG])
    We introduce the notion of performative power, which measures the ability of a firm operating an algorithmic system, such as a digital content recommendation platform, to steer a population. We relate performative power to the economic theory of market power. Traditional economic concepts are well known to struggle with identifying anti-competitive patterns in digital platforms--a core challenge is the difficulty of defining the market, its participants, products, and prices. Performative power sidesteps the problem of market definition by focusing on a directly observable statistical measure instead. High performative power enables a platform to profit from steering participant behavior, whereas low performative power ensures that learning from historical data is close to optimal. Our first general result shows that under low performative power, a firm cannot do better than standard supervised learning on observed data. We draw an analogy with a firm being a price-taker, an economic condition that arises under perfect competition in classical market models. We then contrast this with a market where performative power is concentrated and show that the equilibrium state can differ significantly. We go on to study performative power in a concrete setting of strategic classification where participants can switch between competing firms. We show that monopolies maximize performative power and disutility for the participant, while competition and outside options decrease performative power. We end on a discussion of connections to measures of market power in economics and of the relationship with ongoing antitrust debates.
    Factored Adaptation for Non-Stationary Reinforcement Learning. (arXiv:2203.16582v1 [cs.LG])
    Dealing with non-stationarity in environments (i.e., transition dynamics) and objectives (i.e., reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). Most existing approaches only focus on families of stationary MDPs, in which the non-stationarity is episodic, i.e., the change is only possible across episodes. The few works that do consider non-stationarity without a specific boundary, i.e., also allow for changes within an episode, model the changes monolithically in a single shared embedding vector. In this paper, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that explicitly learns the individual latent change factors affecting the transition dynamics and reward functions. FANS-RL learns jointly the structure of a factored MDP and a factored representation of the time-varying change factors, as well as the specific state components that they affect, via a factored non-stationary variational autoencoder. Through this general framework, we can consider general non-stationary scenarios with different changing function types and changing frequency. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of rewards, compactness of the latent state representation and robustness to varying degrees of non-stationarity.
    A Closer Look at Rehearsal-Free Continual Learning. (arXiv:2203.17269v1 [cs.LG])
    Continual learning describes a setting where machine learning models learn novel concepts from continuously shifting training data, while simultaneously avoiding degradation of knowledge on previously seen classes (a phenomenon known as the catastrophic forgetting problem) which may disappear from the training data for extended periods of time. Current approaches for continual learning of a single expanding task (aka class-incremental continual learning) require extensive rehearsal of previously seen data to avoid this degradation of knowledge. Unfortunately, rehearsal comes at a sharp cost to memory and computation, and it may also violate data-privacy. Instead, we explore combining knowledge distillation and parameter regularization in new ways to achieve strong continual learning performance without rehearsal. Specifically, we take a deep dive into common continual learning techniques: prediction distillation, feature distillation, L2 parameter regularization, and EWC parameter regularization. We first disprove the common assumption that parameter regularization techniques fail for rehearsal-free continual learning of a single, expanding task. Next, we explore how to leverage knowledge from a pre-trained model in rehearsal-free continual learning and find that vanilla L2 parameter regularization outperforms EWC parameter regularization and feature distillation. We then highlight the impact of the rehearsal-free continual learning settings with a classifier expansion benchmark, showing that a strategy based on our findings combined with a positive/negative label balancing heuristic can close the performance gap between the upper bound and the existing strategies by up to roughly 50%. Finally, we show that a simple method consisting of pre-training, L2 regularization, and prediction distillation can even outperform rehearsal-based methods on the common CIFAR-100 benchmark.
    RobIn: A Robust Interpretable Deep Network for Schizophrenia Diagnosis. (arXiv:2203.17085v1 [cs.LG])
    Schizophrenia is a severe mental health condition that requires a long and complicated diagnostic process. However, early diagnosis is vital to control symptoms. Deep learning has recently become a popular way to analyse and interpret medical data. Past attempts to use deep learning for schizophrenia diagnosis from brain-imaging data have shown promise but suffer from a large training-application gap - it is difficult to apply lab research to the real world. We propose to reduce this training-application gap by focusing on readily accessible data. We collect a data set of psychiatric observations of patients based on DSM-5 criteria. Because similar data is already recorded in all mental health clinics that diagnose schizophrenia using DSM-5, our method could be easily integrated into current processes as a tool to assist clinicians, whilst abiding by formal diagnostic criteria. To facilitate real-world usage of our system, we show that it is interpretable and robust. Understanding how a machine learning tool reaches its diagnosis is essential to allow clinicians to trust that diagnosis. To interpret the framework, we fuse two complementary attention mechanisms, 'squeeze and excitation' and 'self-attention', to determine global attribute importance and attribute interactivity, respectively. The model uses these importance scores to make decisions. This allows clinicians to understand how a diagnosis was reached, improving trust in the model. Because machine learning models often struggle to generalise to data from different sources, we perform experiments with augmented test data to evaluate the model's applicability to the real world. We find that our model is more robust to perturbations, and should therefore perform better in a clinical setting. It achieves 98% accuracy with 10-fold cross-validation.
    Cross-modal Learning of Graph Representations using Radar Point Cloud for Long-Range Gesture Recognition. (arXiv:2203.17066v1 [eess.SP])
    Gesture recognition is one of the most intuitive ways of interaction and has gathered particular attention for human computer interaction. Radar sensors possess multiple intrinsic properties, such as their ability to work in low illumination, harsh weather conditions, and being low-cost and compact, making them highly preferable for a gesture recognition solution. However, most literature work focuses on solutions with a limited range that is lower than a meter. We propose a novel architecture for a long-range (1m - 2m) gesture recognition solution that leverages a point cloud-based cross-learning approach from camera point cloud to 60-GHz FMCW radar point cloud, which allows learning better representations while suppressing noise. We use a variant of Dynamic Graph CNN (DGCNN) for the cross-learning, enabling us to model relationships between the points at a local and global level and to model the temporal dynamics a Bi-LSTM network is employed. In the experimental results section, we demonstrate our model's overall accuracy of 98.4% for five gestures and its generalization capability.
    PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations. (arXiv:2203.16965v1 [cs.CL])
    While self-supervised speech representation learning (SSL) models serve a variety of downstream tasks, these models have been observed to overfit to the domain from which the unlabelled data originates. To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data. Intuitively, this helps to make space for the target-domain ASR finetuning. The redundant weights can be identified through various pruning strategies which have been discussed in detail as a part of this work. Specifically, we investigate the effect of the recently discovered Task-Agnostic and Task-Aware pruning on PADA and propose a new pruning paradigm based on the latter, which we call Cross-Domain Task-Aware Pruning (CD-TAW). CD-TAW obtains the initial pruning mask from a well fine-tuned OOD model, which makes it starkly different from the rest of the pruning strategies discussed in the paper. Our proposed CD-TAW methodology achieves up to 20.6% relative WER improvement over our baseline when fine-tuned on a 2-hour subset of Switchboard data without language model (LM) decoding. Furthermore, we conduct a detailed analysis to highlight the key design choices of our proposed method.
    Mutual information estimation for graph convolutional neural networks. (arXiv:2203.16887v1 [cs.LG])
    Measuring model performance is a key issue for deep learning practitioners. However, we often lack the ability to explain why a specific architecture attains superior predictive accuracy for a given data set. Often, validation accuracy is used as a performance heuristic quantifying how well a network generalizes to unseen data, but it does not capture anything about the information flow in the model. Mutual information can be used as a measure of the quality of internal representations in deep learning models, and the information plane may provide insights into whether the model exploits the available information in the data. The information plane has previously been explored for fully connected neural networks and convolutional architectures. We present an architecture-agnostic method for tracking a network's internal representations during training, which are then used to create the mutual information plane. The method is exemplified for graph-based neural networks fitted on citation data. We compare how the inductive bias introduced in graph-based architectures changes the mutual information plane relative to a fully connected neural network.
    Doubly-Robust Estimation for Unbiased Learning-to-Rank from Position-Biased Click Feedback. (arXiv:2203.17118v1 [cs.LG])
    Clicks on rankings suffer from position bias: generally items on lower ranks are less likely to be examined - and thus clicked - by users, in spite of their actual preferences between items. The prevalent approach to unbiased click-based Learning-to-Rank (LTR) is based on counterfactual Inverse-Propensity-Scoring (IPS) estimation. Unique about LTR is the fact that standard Doubly-Robust (DR) estimation - which combines IPS with regression predictions - is inapplicable since the treatment variable - indicating whether a user examined an item - cannot be observed in the data. In this paper, we introduce a novel DR estimator that uses the expectation of treatment per rank instead. Our novel DR estimator has more robust unbiasedness conditions than the existing IPS approach, and in addition, provides enormous decreases in variance: our experimental results indicate it requires several orders of magnitude fewer datapoints to converge at optimal performance. For the unbiased LTR field, our DR estimator contributes both increases in state-of-the-art performance and the most robust theoretical guarantees of all known LTR estimators.
    Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding. (arXiv:2203.16487v2 [cs.CL] UPDATED)
    In this paper, we propose Generalized Aggressive Decoding (GAD) -- a novel approach to accelerating autoregressive translation with no quality loss, through the collaboration of autoregressive and non-autoregressive translation (NAT) of the Transformer. At each decoding iteration, GAD aggressively decodes a number of tokens in parallel as a draft through NAT and then verifies them in the autoregressive manner, where only the tokens that pass the verification are kept as decoded tokens. GAD can achieve the same performance as autoregressive translation but much more efficiently because both NAT drafting and autoregressive verification are fast due to parallel computing. We conduct experiments in the WMT14 English-German translation task and confirm that the vanilla GAD yields exactly the same results as greedy decoding with an around 3x speedup, and that its variant (GAD++) with an advanced verification strategy not only outperforms the greedy translation and even achieves the comparable translation quality with the beam search result, but also further improves the decoding speed, resulting in an around 5x speedup over autoregressive translation. Our models and codes are available at https://github.com/hemingkx/Generalized-Aggressive-Decoding.
    Scaling Up Models and Data with $\texttt{t5x}$ and $\texttt{seqio}$. (arXiv:2203.17189v1 [cs.LG])
    Recent neural network-based language models have benefited greatly from scaling up the size of training datasets and the number of parameters in the models themselves. Scaling can be complicated due to various factors including the need to distribute computation on supercomputer clusters (e.g., TPUs), prevent bottlenecks when infeeding data, and ensure reproducible results. In this work, we present two software libraries that ease these issues: $\texttt{t5x}$ simplifies the process of building and training large language models at scale while maintaining ease of use, and $\texttt{seqio}$ provides a task-based API for simple creation of fast and reproducible training data and evaluation pipelines. These open-source libraries have been used to train models with hundreds of billions of parameters on datasets with multiple terabytes of training data. Along with the libraries, we release configurations and instructions for T5-like encoder-decoder models as well as GPT-like decoder-only architectures. $\texttt{t5x}$ and $\texttt{seqio}$ are open source and available at https://github.com/google-research/t5x and https://github.com/google/seqio, respectively.
    Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data. (arXiv:2203.17113v1 [cs.SD])
    This paper studies a novel pre-training technique with unpaired speech data, Speech2C, for encoder-decoder based automatic speech recognition (ASR). Within a multi-task learning framework, we introduce two pre-training tasks for the encoder-decoder network using acoustic units, i.e., pseudo codes, derived from an offline clustering model. One is to predict the pseudo codes via masked language modeling in encoder output, like HuBERT model, while the other lets the decoder learn to reconstruct pseudo codes autoregressively instead of generating textual scripts. In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text. Comprehensive experiments on the LibriSpeech corpus show that the proposed Speech2C can relatively reduce the word error rate (WER) by 19.2% over the method without decoder pre-training, and also outperforms significantly the state-of-the-art wav2vec 2.0 and HuBERT on fine-tuning subsets of 10h and 100h.
    Imitate and Repurpose: Learning Reusable Robot Movement Skills From Human and Animal Behaviors. (arXiv:2203.17138v1 [cs.RO])
    We investigate the use of prior knowledge of human and animal movement to learn reusable locomotion skills for real legged robots. Our approach builds upon previous work on imitating human or dog Motion Capture (MoCap) data to learn a movement skill module. Once learned, this skill module can be reused for complex downstream tasks. Importantly, due to the prior imposed by the MoCap data, our approach does not require extensive reward engineering to produce sensible and natural looking behavior at the time of reuse. This makes it easy to create well-regularized, task-oriented controllers that are suitable for deployment on real robots. We demonstrate how our skill module can be used for imitation, and train controllable walking and ball dribbling policies for both the ANYmal quadruped and OP3 humanoid. These policies are then deployed on hardware via zero-shot simulation-to-reality transfer. Accompanying videos are available at https://bit.ly/robot-npmp.
    Quantum-Aided Meta-Learning for Bayesian Binary Neural Networks via Born Machines. (arXiv:2203.17089v1 [quant-ph])
    Near-term noisy intermediate-scale quantum circuits can efficiently implement implicit probabilistic models in discrete spaces, supporting distributions that are practically infeasible to sample from using classical means. One of the possible applications of such models, also known as Born machines, is probabilistic inference, which is at the core of Bayesian methods. This paper studies the use of Born machines for the problem of training binary Bayesian neural networks. In the proposed approach, a Born machine is used to model the variational distribution of the binary weights of the neural network, and data from multiple tasks is used to reduce training data requirements on new tasks. The method combines gradient-based meta-learning and variational inference via Born machines, and is shown in a prototypical regression problem to outperform conventional joint learning strategies.
    Traffic4cast at NeurIPS 2021 - Temporal and Spatial Few-Shot Transfer Learning in Gridded Geo-Spatial Processes. (arXiv:2203.17070v1 [cs.LG])
    The IARAI Traffic4cast competitions at NeurIPS 2019 and 2020 showed that neural networks can successfully predict future traffic conditions 1 hour into the future on simply aggregated GPS probe data in time and space bins. We thus reinterpreted the challenge of forecasting traffic conditions as a movie completion task. U-Nets proved to be the winning architecture, demonstrating an ability to extract relevant features in this complex real-world geo-spatial process. Building on the previous competitions, Traffic4cast 2021 now focuses on the question of model robustness and generalizability across time and space. Moving from one city to an entirely different city, or moving from pre-COVID times to times after COVID hit the world thus introduces a clear domain shift. We thus, for the first time, release data featuring such domain shifts. The competition now covers ten cities over 2 years, providing data compiled from over 10^12 GPS probe data. Winning solutions captured traffic dynamics sufficiently well to even cope with these complex domain shifts. Surprisingly, this seemed to require only the previous 1h traffic dynamic history and static road graph as input.
    SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. (arXiv:2203.17001v1 [eess.AS])
    Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.
    WavThruVec: Latent speech representation as intermediate features for neural speech synthesis. (arXiv:2203.16930v1 [cs.SD])
    Recent advances in neural text-to-speech research have been dominated by two-stage pipelines utilizing low-level intermediate speech representation such as mel-spectrograms. However, such predetermined features are fundamentally limited, because they do not allow to exploit the full potential of a data-driven approach through learning hidden representations. For this reason, several end-to-end methods have been proposed. However, such models are harder to train and require a large number of high-quality recordings with transcriptions. Here, we propose WavThruVec - a two-stage architecture that resolves the bottleneck by using high-dimensional Wav2Vec 2.0 embeddings as intermediate speech representation. Since these hidden activations provide high-level linguistic features, they are more robust to noise. That allows us to utilize annotated speech datasets of a lower quality to train the first-stage module. At the same time, the second-stage component can be trained on large-scale untranscribed audio corpora, as Wav2Vec 2.0 embeddings are time-aligned and speaker-independent. This results in an increased generalization capability to out-of-vocabulary words, as well as to a better generalization to unseen speakers. We show that the proposed model not only matches the quality of state-of-the-art neural models, but also presents useful properties enabling tasks like voice conversion or zero-shot synthesis.
    To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo. (arXiv:2203.16682v1 [cs.CV])
    We present a debiased dataset for the Person-centric Visual Grounding (PCVG) task first proposed by Cui et al. (2021) in the Who's Waldo dataset. Given an image and a caption, PCVG requires pairing up a person's name mentioned in a caption with a bounding box that points to the person in the image. We find that the original Who's Waldo dataset compiled for this task contains a large number of biased samples that are solvable simply by heuristic methods; for instance, in many cases the first name in the sentence corresponds to the largest bounding box, or the sequence of names in the sentence corresponds to an exact left-to-right order in the image. Naturally, models trained on these biased data lead to over-estimation of performance on the benchmark. To enforce models being correct for the correct reasons, we design automated tools to filter and debias the original dataset by ruling out all examples of insufficient context, such as those with no verb or with a long chain of conjunct names in their captions. Our experiments show that our new sub-sampled dataset contains less bias with much lowered heuristic performances and widened gaps between heuristic and supervised methods. We also demonstrate the same benchmark model trained on our debiased training set outperforms that trained on the original biased (and larger) training set on our debiased test set. We argue our debiased dataset offers the PCVG task a more practical baseline for reliable benchmarking and future improvements.
    ESGBERT: Language Model to Help with Classification Tasks Related to Companies Environmental, Social, and Governance Practices. (arXiv:2203.16788v1 [cs.CL])
    Environmental, Social, and Governance (ESG) are non-financial factors that are garnering attention from investors as they increasingly look to apply these as part of their analysis to identify material risks and growth opportunities. Some of this attention is also driven by clients who, now more aware than ever, are demanding for their money to be managed and invested responsibly. As the interest in ESG grows, so does the need for investors to have access to consumable ESG information. Since most of it is in text form in reports, disclosures, press releases, and 10-Q filings, we see a need for sophisticated NLP techniques for classification tasks for ESG text. We hypothesize that an ESG domain-specific pre-trained model will help with such and study building of the same in this paper. We explored doing this by fine-tuning BERTs pre-trained weights using ESG specific text and then further fine-tuning the model for a classification task. We were able to achieve accuracy better than the original BERT and baseline models in environment-specific classification tasks.
    Flat-topped Probability Density Functions for Mixture Models. (arXiv:2203.17027v1 [cs.LG])
    This paper investigates probability density functions (PDFs) that are continuous everywhere, nearly uniform around the mode of distribution, and adaptable to a variety of distribution shapes ranging from bell-shaped to rectangular. From the viewpoint of computational tractability, the PDF based on the Fermi-Dirac or logistic function is advantageous in estimating its shape parameters. The most appropriate PDF for $n$-variate distribution is of the form: $p\left(\mathbf{x}\right)\propto\left[\cosh\left(\left[\left(\mathbf{x}-\mathbf{m}\right)^{\mathsf{T}}\boldsymbol{\Sigma}^{-1}\left(\mathbf{x}-\mathbf{m}\right)\right]^{n/2}\right)+\cosh\left(r^{n}\right)\right]^{-1}$ where $\mathbf{x},\mathbf{m}\in\mathbb{R}^{n}$, $\boldsymbol{\Sigma}$ is an $n\times n$ positive definite matrix, and $r>0$ is a shape parameter. The flat-topped PDFs can be used as a component of mixture models in machine learning to improve goodness of fit and make a model as simple as possible.
    Multimodal Fusion Transformer for Remote Sensing Image Classification. (arXiv:2203.16952v1 [cs.CV])
    Vision transformer (ViT) has been trending in image classification tasks due to its promising performance when compared to convolutional neural networks (CNNs). As a result, many researchers have tried to incorporate ViT models in hyperspectral image (HSI) classification tasks, but without achieving satisfactory performance. To this paper, we introduce a new multimodal fusion transformer (MFT) network for HSI land-cover classification, which utilizes other sources of multimodal data in addition to HSI. Instead of using conventional feature fusion techniques, other multimodal data are used as an external classification (CLS) token in the transformer encoder, which helps achieving better generalization. ViT and other similar transformer models use a randomly initialized external classification token {and fail to generalize well}. However, the use of a feature embedding derived from other sources of multimodal data, such as light detection and ranging (LiDAR), offers the potential to improve those models by means of a CLS. The concept of tokenization is used in our work to generate CLS and HSI patch tokens, helping to learn key features in a reduced feature space. We also introduce a new attention mechanism for improving the exchange of information between HSI tokens and the CLS (e.g., LiDAR) token. Extensive experiments are carried out on widely used and benchmark datasets i.e., the University of Houston, Trento, University of Southern Mississippi Gulfpark (MUUFL), and Augsburg. In the results section, we compare the proposed MFT model with other state-of-the-art transformer models, classical CNN models, as well as conventional classifiers. The superior performance achieved by the proposed model is due to the use of multimodal information as external classification tokens.
    HiFi-VC: High Quality ASR-Based Voice Conversion. (arXiv:2203.16937v1 [cs.SD])
    The goal of voice conversion (VC) is to convert input voice to match the target speaker's voice while keeping text and prosody intact. VC is usually used in entertainment and speaking-aid systems, as well as applied for speech data generation and augmentation. The development of any-to-any VC systems, which are capable of generating voices unseen during model training, is of particular interest to both researchers and the industry. Despite recent progress, any-to-any conversion quality is still inferior to natural speech. In this work, we propose a new any-to-any voice conversion pipeline. Our approach uses automated speech recognition (ASR) features, pitch tracking, and a state-of-the-art waveform prediction model. According to multiple subjective and objective evaluations, our method outperforms modern baselines in terms of voice quality, similarity and consistency.
    It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher. (arXiv:2203.17008v1 [cs.CV])
    Model quantization is considered as a promising method to greatly reduce the resource requirements of deep neural networks. To deal with the performance drop induced by quantization errors, a popular method is to use training data to fine-tune quantized networks. In real-world environments, however, such a method is frequently infeasible because training data is unavailable due to security, privacy, or confidentiality concerns. Zero-shot quantization addresses such problems, usually by taking information from the weights of a full-precision teacher network to compensate the performance drop of the quantized networks. In this paper, we first analyze the loss surface of state-of-the-art zero-shot quantization techniques and provide several findings. In contrast to usual knowledge distillation problems, zero-shot quantization often suffers from 1) the difficulty of optimizing multiple loss terms together, and 2) the poor generalization capability due to the use of synthetic samples. Furthermore, we observe that many weights fail to cross the rounding threshold during training the quantized networks even when it is necessary to do so for better performance. Based on the observations, we propose AIT, a simple yet powerful technique for zero-shot quantization, which addresses the aforementioned two problems in the following way: AIT i) uses a KL distance loss only without a cross-entropy loss, and ii) manipulates gradients to guarantee that a certain portion of weights are properly updated after crossing the rounding thresholds. Experiments show that AIT outperforms the performance of many existing methods by a great margin, taking over the overall state-of-the-art position in the field.
    Quasi-orthogonality and intrinsic dimensions as measures of learning and generalisation. (arXiv:2203.16687v1 [cs.LG])
    Finding best architectures of learning machines, such as deep neural networks, is a well-known technical and theoretical challenge. Recent work by Mellor et al (2021) showed that there may exist correlations between the accuracies of trained networks and the values of some easily computable measures defined on randomly initialised networks which may enable to search tens of thousands of neural architectures without training. Mellor et al used the Hamming distance evaluated over all ReLU neurons as such a measure. Motivated by these findings, in our work, we ask the question of the existence of other and perhaps more principled measures which could be used as determinants of success of a given neural architecture. In particular, we examine, if the dimensionality and quasi-orthogonality of neural networks' feature space could be correlated with the network's performance after training. We showed, using the setup as in Mellor et al, that dimensionality and quasi-orthogonality may jointly serve as network's performance discriminants. In addition to offering new opportunities to accelerate neural architecture search, our findings suggest important relationships between the networks' final performance and properties of their randomly initialised feature spaces: data dimension and quasi-orthogonality.
    The ideal data compression and automatic discovery of hidden law using neural network. (arXiv:2203.16941v1 [cs.LG])
    Recently machine learning using neural networks has been developed, and many new methods have been suggested. On the other hand, a system that has true versatility has not been developed, and there remain many fields in which the human brain has advantages over machine learning. We considered how the human brain recognizes events and memorizes them and succeeded to reproduce the system of the human brain on a machine learning model with a new autoencoder neural network (NN). The previous autoencoders have the problem that they cannot define well what is the features of the input data, and we need to restrict the middle layer of the autoencoder artificially. We solve this problem by defining a new loss function that reflects the information entropy, and it enables the NN to compress the input data ideally and automatically discover the hidden law behind the input data set. The loss function used in our NN is based on the free-energy principle which is known as the unified brain theory, and our study is the first concrete formularization of this principle. The result of this study can be applied to any kind of data analysis and also to cognitive science.
    An Exploration of Prompt Tuning on Generative Spoken Language Model for Speech Processing Tasks. (arXiv:2203.16773v1 [eess.AS])
    Speech representations learned from Self-supervised learning (SSL) models have been found beneficial for various speech processing tasks. However, utilizing SSL representations usually requires fine-tuning the pre-trained models or designing task-specific downstream models and loss functions, causing much memory usage and human labor. On the other hand, prompting in Natural Language Processing (NLP) is an efficient and widely used technique to leverage pre-trained language models (LMs). Nevertheless, such a paradigm is little studied in the speech community. We report in this paper the first exploration of the prompt tuning paradigm for speech processing tasks based on Generative Spoken Language Model (GSLM). Experiment results show that the prompt tuning technique achieves competitive performance in speech classification tasks with fewer trainable parameters than fine-tuning specialized downstream models. We further study the technique in challenging sequence generation tasks. Prompt tuning also demonstrates its potential, while the limitation and possible research directions are discussed in this paper.
    Certified machine learning: A posteriori error estimation for physics-informed neural networks. (arXiv:2203.17055v1 [cs.LG])
    Physics-informed neural networks (PINNs) are one popular approach to introduce a priori knowledge about physical systems into the learning framework. PINNs are known to be robust for smaller training sets, derive better generalization problems, and are faster to train. In this paper, we show that using PINNs in comparison with purely data-driven neural networks is not only favorable for training performance but allows us to extract significant information on the quality of the approximated solution. Assuming that the underlying differential equation for the PINN training is an ordinary differential equation, we derive a rigorous upper limit on the PINN prediction error. This bound is applicable even for input data not included in the training phase and without any prior knowledge about the true solution. Therefore, our a posteriori error estimation is an essential step to certify the PINN. We apply our error estimator exemplarily to two academic toy problems, whereof one falls in the category of model-predictive control and thereby shows the practical use of the derived results.
    Exploiting Explainable Metrics for Augmented SGD. (arXiv:2203.16723v1 [cs.LG])
    Explaining the generalization characteristics of deep learning is an emerging topic in advanced machine learning. There are several unanswered questions about how learning under stochastic optimization really works and why certain strategies are better than others. In this paper, we address the following question: \textit{can we probe intermediate layers of a deep neural network to identify and quantify the learning quality of each layer?} With this question in mind, we propose new explainability metrics that measure the redundant information in a network's layers using a low-rank factorization framework and quantify a complexity measure that is highly correlated with the generalization performance of a given optimizer, network, and dataset. We subsequently exploit these metrics to augment the Stochastic Gradient Descent (SGD) optimizer by adaptively adjusting the learning rate in each layer to improve in generalization performance. Our augmented SGD -- dubbed RMSGD -- introduces minimal computational overhead compared to SOTA methods and outperforms them by exhibiting strong generalization characteristics across application, architecture, and dataset.
    Assessing the risk of re-identification arising from an attack on anonymised data. (arXiv:2203.16921v1 [cs.LG])
    Objective: The use of routinely-acquired medical data for research purposes requires the protection of patient confidentiality via data anonymisation. The objective of this work is to calculate the risk of re-identification arising from a malicious attack to an anonymised dataset, as described below. Methods: We first present an analytical means of estimating the probability of re-identification of a single patient in a k-anonymised dataset of Electronic Health Record (EHR) data. Second, we generalize this solution to obtain the probability of multiple patients being re-identified. We provide synthetic validation via Monte Carlo simulations to illustrate the accuracy of the estimates obtained. Results: The proposed analytical framework for risk estimation provides re-identification probabilities that are in agreement with those provided by simulation in a number of scenarios. Our work is limited by conservative assumptions which inflate the re-identification probability. Discussion: Our estimates show that the re-identification probability increases with the proportion of the dataset maliciously obtained and that it has an inverse relationship with the equivalence class size. Our recursive approach extends the applicability domain to the general case of a multi-patient re-identification attack in an arbitrary k-anonymisation scheme. Conclusion: We prescribe a systematic way to parametrize the k-anonymisation process based on a pre-determined re-identification probability. We observed that the benefits of a reduced re-identification risk that come with increasing k-size may not be worth the reduction in data granularity when one is considering benchmarking the re-identification probability on the size of the portion of the dataset maliciously obtained by the adversary.
    Equivariant Diffusion for Molecule Generation in 3D. (arXiv:2203.17003v1 [cs.LG])
    This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
    Task Adaptive Parameter Sharing for Multi-Task Learning. (arXiv:2203.16708v1 [cs.LG])
    Adapting pre-trained models with broad capabilities has become standard practice for learning a wide range of downstream tasks. The typical approach of fine-tuning different models for each task is performant, but incurs a substantial memory cost. To efficiently learn multiple downstream tasks we introduce Task Adaptive Parameter Sharing (TAPS), a general method for tuning a base model to a new task by adaptively modifying a small, task-specific subset of layers. This enables multi-task learning while minimizing resources used and competition between tasks. TAPS solves a joint optimization problem which determines which layers to share with the base model and the value of the task-specific weights. Further, a sparsity penalty on the number of active layers encourages weight sharing with the base model. Compared to other methods, TAPS retains high accuracy on downstream tasks while introducing few task-specific parameters. Moreover, TAPS is agnostic to the model architecture and requires only minor changes to the training scheme. We evaluate our method on a suite of fine-tuning tasks and architectures (ResNet, DenseNet, ViT) and show that it achieves state-of-the-art performance while being simple to implement.
    Adaptive Estimation of Random Vectors with Bandit Feedback. (arXiv:2203.16810v1 [cs.LG])
    We consider the problem of sequentially learning to estimate, in the mean squared error (MSE) sense, a Gaussian $K$-vector of unknown covariance by observing only $m < K$ of its entries in each round. This reduces to learning an optimal subset for estimating the entire vector. Towards this, we first establish an exponential concentration bound for an estimate of the MSE for each observable subset. We then frame the estimation problem with bandit feedback in the best-subset identification setting. We propose a variant of the successive elimination algorithm to cater to the adaptive estimation problem, and we derive an upper bound on the sample complexity of this algorithm. In addition, to understand the fundamental limit on the sample complexity of this adaptive estimation bandit problem, we derive a minimax lower bound.
    Generating High Fidelity Data from Low-density Regions using Diffusion Models. (arXiv:2203.17260v1 [cs.CV])
    Our work focuses on addressing sample deficiency from low-density regions of data manifold in common image datasets. We leverage diffusion process based generative models to synthesize novel images from low-density regions. We observe that uniform sampling from diffusion models predominantly samples from high-density regions of the data manifold. Therefore, we modify the sampling process to guide it towards low-density regions while simultaneously maintaining the fidelity of synthetic data. We rigorously demonstrate that our process successfully generates novel high fidelity samples from low-density regions. We further examine generated samples and show that the model does not memorize low-density data and indeed learns to generate novel samples from low-density regions.
    An Empirical Study of Language Model Integration for Transducer based Speech Recognition. (arXiv:2203.16776v1 [eess.AS])
    Utilizing text-only data with an external language model (LM) in end-to-end RNN-Transducer (RNN-T) for speech recognition is challenging. Recently, a class of methods such as density ratio (DR) and ILM estimation (ILME) have been developed, outperforming the classic shallow fusion (SF) method. The basic idea behind these methods is that RNN-T posterior should first subtract the implicitly learned ILM prior, in order to integrate the external LM. While recent studies suggest that RNN-T only learns some low-order language model information, the DR method uses a well-trained ILM. We hypothesize that this setting is appropriate and may deteriorate the performance of the DR method, and propose a low-order density ratio method (LODR) by training a low-order weak ILM for DR. Extensive empirical experiments are conducted on both in-domain and cross-domain scenarios on English LibriSpeech & Tedlium-2 and Chinese WenetSpeech & AISHELL-1 datasets. It is shown that LODR consistently outperforms SF in all tasks, while performing generally close to ILME and better than DR in most tests.
    Towards Driving-Oriented Metric for Lane Detection Models. (arXiv:2203.16851v1 [cs.CV])
    After the 2017 TuSimple Lane Detection Challenge, its dataset and evaluation based on accuracy and F1 score have become the de facto standard to measure the performance of lane detection methods. While they have played a major role in improving the performance of lane detection methods, the validity of this evaluation method in downstream tasks has not been adequately researched. In this study, we design 2 new driving-oriented metrics for lane detection: End-to-End Lateral Deviation metric (E2E-LD) is directly formulated based on the requirements of autonomous driving, a core downstream task of lane detection; Per-frame Simulated Lateral Deviation metric (PSLD) is a lightweight surrogate metric of E2E-LD. To evaluate the validity of the metrics, we conduct a large-scale empirical study with 4 major types of lane detection approaches on the TuSimple dataset and our newly constructed dataset Comma2k19-LD. Our results show that the conventional metrics have strongly negative correlations ($\leq$-0.55) with E2E-LD, meaning that some recent improvements purely targeting the conventional metrics may not have led to meaningful improvements in autonomous driving, but rather may actually have made it worse by overfitting to the conventional metrics. As autonomous driving is a security/safety-critical system, the underestimation of robustness hinders the sound development of practical lane detection models. We hope that our study will help the community achieve more downstream task-aware evaluations for lane detection.
    System Identification via Nuclear Norm Regularization. (arXiv:2203.16673v1 [stat.ML])
    This paper studies the problem of identifying low-order linear systems via Hankel nuclear norm regularization. Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system. We provide novel statistical analysis for this regularization and carefully contrast it with the unregularized ordinary least-squares (OLS) estimator. Our analysis leads to new bounds on estimating the impulse response and the Hankel matrix associated with the linear system. We first design an input excitation and show that Hankel regularization enables one to recover the system using optimal number of observations in the true system order and achieve strong statistical estimation rates. Surprisingly, we demonstrate that the input design indeed matters, by showing that intuitive choices such as i.i.d. Gaussian input leads to provably sub-optimal sample complexity. To better understand the benefits of regularization, we also revisit the OLS estimator. Besides refining existing bounds, we experimentally identify when regularized approach improves over OLS: (1) For low-order systems with slow impulse-response decay, OLS method performs poorly in terms of sample complexity, (2) Hankel matrix returned by regularization has a more clear singular value gap that ease identification of the system order, (3) Hankel regularization is less sensitive to hyperparameter choice. Finally, we establish model selection guarantees through a joint train-validation procedure where we tune the regularization parameter for near-optimal estimation.
    Predicting extreme events from data using deep machine learning: when and where. (arXiv:2203.17155v1 [cs.LG])
    We develop a deep convolutional neural network (DCNN) based framework for model-free prediction of the occurrence of extreme events both in time ("when") and in space ("where") in nonlinear physical systems of spatial dimension two. The measurements or data are a set of two-dimensional snapshots or images. For a desired time horizon of prediction, a proper labeling scheme can be designated to enable successful training of the DCNN and subsequent prediction of extreme events in time. Given that an extreme event has been predicted to occur within the time horizon, a space-based labeling scheme can be applied to predict, within certain resolution, the location at which the event will occur. We use synthetic data from the 2D complex Ginzburg-Landau equation and empirical wind speed data of the North Atlantic ocean to demonstrate and validate our machine-learning based prediction framework. The trade-offs among the prediction horizon, spatial resolution, and accuracy are illustrated, and the detrimental effect of spatially biased occurrence of extreme event on prediction accuracy is discussed. The deep learning framework is viable for predicting extreme events in the real world.
    Conditional Autoregressors are Interpretable Classifiers. (arXiv:2203.17002v1 [cs.LG])
    We explore the use of class-conditional autoregressive (CA) models to perform image classification on MNIST-10. Autoregressive models assign probability to an entire input by combining probabilities from each individual feature; hence classification decisions made by a CA can be readily decomposed into contributions from each each input feature. That is to say, CA are inherently locally interpretable. Our experiments show that naively training a CA achieves much worse accuracy compared to a standard classifier, however this is due to over-fitting and not a lack of expressive power. Using knowledge distillation from a standard classifier, a student CA can be trained to match the performance of the teacher while still being interpretable.
    Generating Scientific Articles with Machine Learning. (arXiv:2203.16569v1 [cs.LG])
    In recent years, the field of machine learning has seen rapid growth, with applications in a variety of domains, including image recognition, natural language processing, and predictive modeling. In this paper, we explore the application of machine learning to the generation of scientific articles. We present a method for using machine learning to generate scientific articles based on a data set of scientific papers. The method uses a machine-learning algorithm to learn the structure of a scientific article and a set of training data consisting of scientific papers. The machine-learning algorithm is used to generate a scientific article based on the data set of scientific papers. We evaluate the performance of the method by comparing the generated article to a set of manually written articles. The results show that the machine-generated article is of similar quality to the manually written articles.
    Spatially Adaptive Online Prediction of Piecewise Regular Functions. (arXiv:2203.16587v1 [math.ST])
    We consider the problem of estimating piecewise regular functions in an online setting, i.e., the data arrive sequentially and at any round our task is to predict the value of the true function at the next revealed point using the available data from past predictions. We propose a suitably modified version of a recently developed online learning algorithm called the sleeping experts aggregation algorithm. We show that this estimator satisfies oracle risk bounds simultaneously for all local regions of the domain. As concrete instantiations of the expert aggregation algorithm proposed here, we study an online mean aggregation and an online linear regression aggregation algorithm where experts correspond to the set of dyadic subrectangles of the domain. The resulting algorithms are near linear time computable in the sample size. We specifically focus on the performance of these online algorithms in the context of estimating piecewise polynomial and bounded variation function classes in the fixed design setup. The simultaneous oracle risk bounds we obtain for these estimators in this context provide new and improved (in certain aspects) guarantees even in the batch setting and are not available for the state of the art batch learning estimators.
    Robust Meta-Reinforcement Learning with Curriculum-Based Task Sampling. (arXiv:2203.16801v1 [cs.LG])
    Meta-reinforcement learning (meta-RL) acquires meta-policies that show good performance for tasks in a wide task distribution. However, conventional meta-RL, which learns meta-policies by randomly sampling tasks, has been reported to show meta-overfitting for certain tasks, especially for easy tasks where an agent can easily get high scores. To reduce effects of the meta-overfitting, we considered meta-RL with curriculum-based task sampling. Our method is Robust Meta Reinforcement Learning with Guided Task Sampling (RMRL-GTS), which is an effective method that restricts task sampling based on scores and epochs. We show that in order to achieve robust meta-RL, it is necessary not only to intensively sample tasks with poor scores, but also to restrict and expand the task regions of the tasks to be sampled.
    Mask Atari for Deep Reinforcement Learning as POMDP Benchmarks. (arXiv:2203.16777v1 [cs.CV])
    We present Mask Atari, a new benchmark to help solve partially observable Markov decision process (POMDP) problems with Deep Reinforcement Learning (DRL)-based approaches. To achieve a simulation environment for the POMDP problems, Mask Atari is constructed based on Atari 2600 games with controllable, moveable, and learnable masks as the observation area for the target agent, especially with the active information gathering (AIG) setting in POMDPs. Given that one does not yet exist, Mask Atari provides a challenging, efficient benchmark for evaluating the methods that focus on the above problem. Moreover, the mask operation is a trial for introducing the receptive field in the human vision system into a simulation environment for an agent, which means the evaluations are not biased from the sensing ability and purely focus on the cognitive performance of the methods when compared with the human baseline. We describe the challenges and features of our benchmark and evaluate several baselines with Mask Atari.
    Learning the Effect of Registration Hyperparameters with HyperMorph. (arXiv:2203.16680v1 [cs.CV])
    We introduce HyperMorph, a framework that facilitates efficient hyperparameter tuning in learning-based deformable image registration. Classical registration algorithms perform an iterative pair-wise optimization to compute a deformation field that aligns two images. Recent learning-based approaches leverage large image datasets to learn a function that rapidly estimates a deformation for a given image pair. In both strategies, the accuracy of the resulting spatial correspondences is strongly influenced by the choice of certain hyperparameter values. However, an effective hyperparameter search consumes substantial time and human effort as it often involves training multiple models for different fixed hyperparameter values and may lead to suboptimal registration. We propose an amortized hyperparameter learning strategy to alleviate this burden by learning the impact of hyperparameters on deformation fields. We design a meta network, or hypernetwork, that predicts the parameters of a registration network for input hyperparameters, thereby comprising a single model that generates the optimal deformation field corresponding to given hyperparameter values. This strategy enables fast, high-resolution hyperparameter search at test-time, reducing the inefficiency of traditional approaches while increasing flexibility. We also demonstrate additional benefits of HyperMorph, including enhanced robustness to model initialization and the ability to rapidly identify optimal hyperparameter values specific to a dataset, image contrast, task, or even anatomical region, all without the need to retrain models. We make our code publicly available at this http URL
    MAE-AST: Masked Autoencoding Audio Spectrogram Transformer. (arXiv:2203.16691v1 [eess.AS])
    In this paper, we propose a simple yet powerful improvement over the recent Self-Supervised Audio Spectrogram Transformer (SSAST) model for speech and audio classification. Specifically, we leverage the insight that the SSAST uses a very high masking ratio (75%) during pretraining, meaning that the vast majority of self-attention compute is performed on mask tokens. We address this by integrating the encoder-decoder architecture from Masked Autoencoders are Scalable Vision Learners (MAE) into the SSAST, where a deep encoder operates on only unmasked input, and a shallow decoder operates on encoder outputs and mask tokens. We find that MAE-like pretraining can provide a 3x speedup and 2x memory usage reduction over the vanilla SSAST using current audio pretraining strategies with ordinary model and input sizes. When fine-tuning on downstream tasks, which only uses the encoder, we find that our approach outperforms the SSAST on a variety of downstream tasks. We further conduct comprehensive evaluations into different strategies of pretraining and explore differences in MAE-style pretraining between the visual and audio domains.
    Direction of Arrival Estimation of Sound Sources Using Icosahedral CNNs. (arXiv:2203.16940v1 [eess.AS])
    In this paper, we present a new model for Direction of Arrival (DOA) estimation of sound sources based on an Icosahedral Convolutional Neural Network (CNN) applied over SRP-PHAT power maps computed from the signals received by a microphone array. This icosahedral CNN is equivariant to the 60 rotational symmetries of the icosahedron, which represent a good approximation of the continuous space of spherical rotations, and can be implemented using standard 2D convolutional layers, having a lower computational cost than most of the spherical CNNs. In addition, instead of using fully connected layers after the icosahedral convolutions, we propose a new soft-argmax function that can be seen as a differentiable version of the argmax function and allows us to solve the DOA estimation as a regression problem interpreting the output of the convolutional layers as a probability distribution. We prove that using models that fit the equivariances of the problem allows us to outperform other state-of-the-art models with a lower computational cost and more robustness, obtaining root mean square localization errors lower than 10{\deg} even in scenarios with a reverberation time $T_{60}$ of 1.5 s.
    Differentially Private Federated Learning via Reconfigurable Intelligent Surface. (arXiv:2203.17028v1 [eess.SP])
    Federated learning (FL), as a disruptive machine learning paradigm, enables the collaborative training of a global model over decentralized local datasets without sharing them. It spans a wide scope of applications from Internet-of-Things (IoT) to biomedical engineering and drug discovery. To support low-latency and high-privacy FL over wireless networks, in this paper, we propose a reconfigurable intelligent surface (RIS) empowered over-the-air FL system to alleviate the dilemma between learning accuracy and privacy. This is achieved by simultaneously exploiting the channel propagation reconfigurability with RIS for boosting the receive signal power, as well as waveform superposition property with over-the-air computation (AirComp) for fast model aggregation. By considering a practical scenario where high-dimensional local model updates are transmitted across multiple communication blocks, we characterize the convergence behaviors of the differentially private federated optimization algorithm. We further formulate a system optimization problem to optimize the learning accuracy while satisfying privacy and power constraints via the joint design of transmit power, artificial noise, and phase shifts at RIS, for which a two-step alternating minimization framework is developed. Simulation results validate our systematic, theoretical, and algorithmic achievements and demonstrate that RIS can achieve a better trade-off between privacy and accuracy for over-the-air FL systems.
    Message Passing Neural Networks for Hypergraphs. (arXiv:2203.16995v1 [cs.LG])
    Hypergraph representations are both more efficient and better suited to describe data characterized by relations between two or more objects. In this work, we present the first graph neural network based on message passing capable of processing hypergraph-structured data. We show that the proposed model defines a design space for neural network models for hypergraphs, thus generalizing existing models for hypergraphs. We report experiments on a benchmark dataset for node classification, highlighting the effectiveness of the proposed model with respect to other state-of-the-art methods for graphs and hypergraphs. We also discuss the benefits of using hypergraph representations and, at the same time, highlight the limitation of using equivalent graph representations when the underlying problem has relations among more than two objects.
    Bangla hate speech detection on social media using attention-based recurrent neural network. (arXiv:2203.16775v1 [cs.CL])
    Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder decoder based machine learning model, a popular tool in NLP, to classify user's Bengali comments on Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU based decoders have been used for predicting hate speech categories. Among the three encoder decoder algorithms, the attention-based decoder obtained the best accuracy (77%).
    Dual Temperature Helps Contrastive Learning Without Many Negative Samples: Towards Understanding and Simplifying MoCo. (arXiv:2203.17248v1 [cs.LG])
    Contrastive learning (CL) is widely known to require many negative samples, 65536 in MoCo for instance, for which the performance of a dictionary-free framework is often inferior because the negative sample size (NSS) is limited by its mini-batch size (MBS). To decouple the NSS from the MBS, a dynamic dictionary has been adopted in a large volume of CL frameworks, among which arguably the most popular one is MoCo family. In essence, MoCo adopts a momentum-based queue dictionary, for which we perform a fine-grained analysis of its size and consistency. We point out that InfoNCE loss used in MoCo implicitly attract anchors to their corresponding positive sample with various strength of penalties and identify such inter-anchor hardness-awareness property as a major reason for the necessity of a large dictionary. Our findings motivate us to simplify MoCo v2 via the removal of its dictionary as well as momentum. Based on an InfoNCE with the proposed dual temperature, our simplified frameworks, SimMoCo and SimCo, outperform MoCo v2 by a visible margin. Moreover, our work bridges the gap between CL and non-CL frameworks, contributing to a more unified understanding of these two mainstream frameworks in SSL. Code is available at: https://bit.ly/3LkQbaT.
    Intelligent Icing Detection Model of Wind Turbine Blades Based on SCADA data. (arXiv:2101.07914v1 [cs.LG] CROSS LISTED)
    Diagnosis of ice accretion on wind turbine blades is all the time a hard nut to crack in condition monitoring of wind farms. Existing methods focus on mechanism analysis of icing process, deviation degree analysis of feature engineering. However, there have not been deep researches of neural networks applied in this field at present. Supervisory control and data acquisition (SCADA) makes it possible to train networks through continuously providing not only operation parameters and performance parameters of wind turbines but also environmental parameters and operation modes. This paper explores the possibility that using convolutional neural networks (CNNs), generative adversarial networks (GANs) and domain adaption learning to establish intelligent diagnosis frameworks under different training scenarios. Specifically, PGANC and PGANT are proposed for sufficient and non-sufficient target wind turbine labeled data, respectively. The basic idea is that we consider a two-stage training with parallel GANs, which are aimed at capturing intrinsic features for normal and icing samples, followed by classification CNN or domain adaption module in various training cases. Model validation on three wind turbine SCADA data shows that two-stage training can effectively improve the model performance. Besides, if there is no sufficient labeled data for a target turbine, which is an extremely common phenomenon in real industrial practices, the addition of domain adaption learning makes the trained model show better performance. Overall, our proposed intelligent diagnosis frameworks can achieve more accurate detection on the same wind turbine and more generalized capability on a new wind turbine, compared with other machine learning models and conventional CNNs.
    A data-driven approach for the closure of RANS models by the divergence of the Reynolds Stress Tensor. (arXiv:2203.16944v1 [physics.flu-dyn])
    In the present paper a new data-driven model to close and increase accuracy of RANS equations is proposed. It is based on the direct approximation of the divergence of the Reynolds Stress Tensor (RST) through a Neural Network (NN). This choice is driven by the presence of the divergence of RST in the RANS equations. Furthermore, once this data-driven approach is trained, there is no need to run any turbulence model to close the equations. Finally, it is well known that a good approximation of a function it is not necessarily a good approximation of its derivative. The architecture and inputs choices of the proposed network guarantee both Galilean and coordinates-frame rotation invariances by looking to a vector basis expansion of the divergence of the RST. Two well-known test cases are used to show advantages of the proposed method compared to classic turbulence models.
    Graph Node-Feature Convolution for Representation Learning. (arXiv:1812.00086v2 [cs.LG] UPDATED)
    Graph convolutional network (GCN) is an emerging neural network approach. It learns new representation of a node by aggregating feature vectors of all neighbors in the aggregation process without considering whether the neighbors or features are useful or not. Recent methods have improved solutions by sampling a fixed size set of neighbors, or assigning different weights to different neighbors in the aggregation process, but features within a feature vector are still treated equally in the aggregation process. In this paper, we introduce a new convolution operation on regular size feature maps constructed from features of a fixed node bandwidth via sampling to get the first-level node representation, which is then passed to a standard GCN to learn the second-level node representation. Experiments show that our method outperforms competing methods in semi-supervised node classification tasks. Furthermore, our method opens new doors for exploring new GCN architectures, particularly deeper GCN models.
    Data-driven Set-based Estimation of Polynomial Systems with Application to SIR Epidemics. (arXiv:2111.04704v2 [eess.SY] UPDATED)
    This paper proposes a data-driven set-based estimation algorithm for a class of nonlinear systems with polynomial nonlinearities. Using the system's input-output data, the proposed method computes a set that guarantees the inclusion of the system's state in real-time. Although the system is assumed to be a polynomial type, the exact polynomial functions, and their coefficients are assumed to be unknown. To this end, the estimator relies on offline and online phases. The offline phase utilizes past input-output data to estimate a set of possible coefficients of the polynomial system. Then, using this estimated set of coefficients and the side information about the system, the online phase provides a set estimate of the state. Finally, the proposed methodology is evaluated through its application on SIR (Susceptible, Infected, Recovered) epidemic model.
    Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching. (arXiv:2201.11251v2 [cs.LG] UPDATED)
    Subgraph matching is a fundamental problem in various fields that use graph structured data. Subgraph matching algorithms enumerate all isomorphic embeddings of a query graph q in a data graph G. An important branch of matching algorithms exploit the backtracking search approach which recursively extends intermediate results following a matching order of query vertices. It has been shown that the matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In recent years, many advanced techniques for query vertex ordering (i.e., matching order generation) have been proposed to reduce the unpromising intermediate results according to the preset heuristic rules. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms. Instead of using the fixed heuristics to generate the matching order, our model could capture and make full use of the graph information, and thus determine the query vertex order with the adaptive learning-based rule that could significantly reduces the number of redundant enumerations. With the help of the reinforcement learning framework, our model is able to consider the long-term benefits rather than only consider the local information at current ordering step.Extensive experiments on six real-life data graphs demonstrate that our proposed matching order generation technique could reduce up to two orders of magnitude of query processing time compared to the state-of-the-art algorithms.
    Interpretation of Black Box NLP Models: A Survey. (arXiv:2203.17081v1 [cs.LG])
    An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME, are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
    Stability and Generalization Capabilities of Message Passing Graph Neural Networks. (arXiv:2202.00645v3 [cs.LG] UPDATED)
    Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization capabilities of MPNNs in graph classification. We assume that graphs of different classes are sampled from different random graph models. Based on this data distribution, we derive a non-asymptotic bound on the generalization gap between the empirical and statistical loss, that decreases to zero as the graphs become larger. This is proven by showing that a MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.
    BERTraffic: BERT-based Joint Speaker Role and Speaker Change Detection for Air Traffic Control Communications. (arXiv:2110.05781v2 [eess.AS] UPDATED)
    Automatic speech recognition (ASR) allows transcribing the communications between air traffic controllers (ATCOs) and aircraft pilots. The transcriptions are used later to extract ATC named entities e.g., aircraft callsigns, command types, or values. One common challenge is Speech Activity Detection (SAD) and diarization system. If one of them fails then two or more single speaker segments remain in the same recording, jeopardizing the overall system's performance. We propose a system that combines the segmentation of a SAD module with a BERT model that performs speaker change detection (SCD) and speaker role detection (SRD) by chunking ASR transcripts i.e., diarization with a defined number of speakers together with SRD. The proposed model is evaluated on real-life ATC test sets. It reaches up to 0.90/0.95 F1-score on ATCO/pilot SRD, which means a 27% relative improvement on diarization error rate (DER) compared to standard acoustic-based diarization. Results are measured on ASR transcripts of challenging ATC test sets with $\sim$13\% word error rate, and the robustness of the system is even validated on noisy ASR transcripts.
    Sequence Transduction with Graph-based Supervision. (arXiv:2111.01272v2 [cs.CL] UPDATED)
    The recurrent neural network transducer (RNN-T) objective plays a major role in building today's best automatic speech recognition (ASR) systems for production. Similarly to the connectionist temporal classification (CTC) objective, the RNN-T loss uses specific rules that define how a set of alignments is generated to form a lattice for the full-sum training. However, it is yet largely unknown if these rules are optimal and do lead to the best possible ASR results. In this work, we present a new transducer objective function that generalizes the RNN-T loss to accept a graph representation of the labels, thus providing a flexible and efficient framework to manipulate training lattices, e.g., for studying different transition rules, implementing different transducer losses, or restricting alignments. We demonstrate that transducer-based ASR with CTC-like lattice achieves better results compared to standard RNN-T, while also ensuring a strictly monotonic alignment, which will allow better optimization of the decoding procedure. For example, the proposed CTC-like transducer achieves an improvement of 4.8% on the test-other condition of LibriSpeech relative to an equivalent RNN-T based system.
    D2ADA: Dynamic Density-aware Active Domain Adaptation for Semantic Segmentation. (arXiv:2202.06484v3 [cs.CV] UPDATED)
    In the field of domain adaptation, a trade-off exists between the model performance and the number of target domain annotations. Active learning, maximizing model performance with few informative labeled data, comes in handy for such a scenario. In this work, we present D2ADA, a general active domain adaptation framework for semantic segmentation. To adapt the model to the target domain with minimum queried labels, we propose acquiring labels of the samples with high probability density in the target domain yet with low probability density in the source domain, complementary to the existing source domain labeled data. To further facilitate labeling efficiency, we design a dynamic scheduling policy to adjust the labeling budgets between domain exploration and model uncertainty over time. Extensive experiments show that our method outperforms existing active learning and domain adaptation baselines on two benchmarks, GTA5 -> Cityscapes and SYNTHIA -> Cityscapes. With less than 5% target domain annotations, our method reaches comparable results with that of full supervision.
    Bayesian optimization with known experimental and design constraints for chemistry applications. (arXiv:2203.17241v1 [math.OC])
    Optimization strategies driven by machine learning, such as Bayesian optimization, are being explored across experimental sciences as an efficient alternative to traditional design of experiment. When combined with automated laboratory hardware and high-performance computing, these strategies enable next-generation platforms for autonomous experimentation. However, the practical application of these approaches is hampered by a lack of flexible software and algorithms tailored to the unique requirements of chemical research. One such aspect is the pervasive presence of constraints in the experimental conditions when optimizing chemical processes or protocols, and in the chemical space that is accessible when designing functional molecules or materials. Although many of these constraints are known a priori, they can be interdependent, non-linear, and result in non-compact optimization domains. In this work, we extend our experiment planning algorithms Phoenics and Gryffin such that they can handle arbitrary known constraints via an intuitive and flexible interface. We benchmark these extended algorithms on continuous and discrete test functions with a diverse set of constraints, demonstrating their flexibility and robustness. In addition, we illustrate their practical utility in two simulated chemical research scenarios: the optimization of the synthesis of o-xylenyl Buckminsterfullerene adducts under constrained flow conditions, and the design of redox active molecules for flow batteries under synthetic accessibility constraints. The tools developed constitute a simple, yet versatile strategy to enable model-based optimization with known experimental constraints, contributing to its applicability as a core component of autonomous platforms for scientific discovery.
    GoSafeOpt: Scalable Safe Exploration for Global Optimization of Dynamical Systems. (arXiv:2201.09562v3 [cs.LG] UPDATED)
    Learning optimal control policies directly on physical systems is challenging since even a single failure can lead to costly hardware damage. Most existing model-free learning methods that guarantee safety, i.e., no failures, during exploration are limited to local optima. A notable exception is the GoSafe algorithm, which, unfortunately, cannot handle high-dimensional systems and hence cannot be applied to most real-world dynamical systems. This work proposes GoSafeOpt as the first algorithm that can safely discover globally optimal policies for high-dimensional systems while giving safety and optimality guarantees. We demonstrate the superiority of GoSafeOpt over competing model-free safe learning methods on a robot arm that would be prohibitive for GoSafe.
    BARC: Learning to Regress 3D Dog Shape from Images by Exploiting Breed Information. (arXiv:2203.15536v2 [cs.CV] UPDATED)
    Our goal is to recover the 3D shape and pose of dogs from a single image. This is a challenging task because dogs exhibit a wide range of shapes and appearances, and are highly articulated. Recent work has proposed to directly regress the SMAL animal model, with additional limb scale parameters, from images. Our method, called BARC (Breed-Augmented Regression using Classification), goes beyond prior work in several important ways. First, we modify the SMAL shape space to be more appropriate for representing dog shape. But, even with a better shape model, the problem of regressing dog shape from an image is still challenging because we lack paired images with 3D ground truth. To compensate for the lack of paired data, we formulate novel losses that exploit information about dog breeds. In particular, we exploit the fact that dogs of the same breed have similar body shapes. We formulate a novel breed similarity loss consisting of two parts: One term encourages the shape of dogs from the same breed to be more similar than dogs of different breeds. The second one, a breed classification loss, helps to produce recognizable breed-specific shapes. Through ablation studies, we find that our breed losses significantly improve shape accuracy over a baseline without them. We also compare BARC qualitatively to WLDO with a perceptual study and find that our approach produces dogs that are significantly more realistic. This work shows that a-priori information about genetic similarity can help to compensate for the lack of 3D training data. This concept may be applicable to other animal species or groups of species. Our code is publicly available for research purposes at https://barc.is.tue.mpg.de/.
    SGTR: End-to-end Scene Graph Generation with Transformer. (arXiv:2112.12970v3 [cs.CV] UPDATED)
    Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property. Most previous works adopt a bottom-up two-stage or a point-based one-stage approach, which often suffers from high time complexity or sub-optimal designs. In this work, we propose a novel SGG method to address the aforementioned issues, formulating the task as a bipartite graph construction problem. To solve the problem, we develop a transformer-based end-to-end framework that first generates the entity and predicate proposal set, followed by inferring directed edges to form the relation triplets. In particular, we develop a new entity-aware predicate representation based on a structural predicate generator that leverages the compositional property of relationships. Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner. Extensive experimental results show that our design is able to achieve the state-of-the-art or comparable performance on two challenging benchmarks, surpassing most of the existing approaches and enjoying higher efficiency in inference. We hope our model can serve as a strong baseline for the Transformer-based scene graph generation. Code is available: https://github.com/Scarecrow0/SGTR
    GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design. (arXiv:2112.11594v2 [cs.AR] UPDATED)
    Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and hindering the exploration of deeper and more sophisticated GCN graphs. This is because real-world graphs can be extremely large and sparse. Furthermore, the node degree of GCNs tends to follow the power-law distribution and therefore have highly irregular adjacency matrices, resulting in prohibitive inefficiencies in both data processing and movement and thus substantially limiting the achievable GCN acceleration efficiency. To this end, this paper proposes a GCN algorithm and accelerator Co-Design framework dubbed GCoD which can largely alleviate the aforementioned GCN irregularity and boost GCNs' inference efficiency. Specifically, on the algorithm level, GCoD integrates a split and conquer GCN training strategy that polarizes the graphs to be either denser or sparser in local neighborhoods without compromising the model accuracy, resulting in graph adjacency matrices that (mostly) have merely two levels of workload and enjoys largely enhanced regularity and thus ease of acceleration. On the hardware level, we further develop a dedicated two-pronged accelerator with a separated engine to process each of the aforementioned denser and sparser workloads, further boosting the overall utilization and acceleration efficiency. Extensive experiments and ablation studies validate that our GCoD consistently reduces the number of off-chip accesses, leading to speedups of 15286x, 294x, 7.8x, and 2.5x as compared to CPUs, GPUs, and prior-art GCN accelerators including HyGCN and AWB-GCN, respectively, while maintaining or even improving the task accuracy. Codes are available at https://github.com/RICE-EIC/GCoD.
    R2L: Distilling Neural Radiance Field to Neural Light Field for Efficient Novel View Synthesis. (arXiv:2203.17261v1 [cs.CV])
    Recent research explosion on Neural Radiance Field (NeRF) shows the encouraging potential to represent complex scenes with neural networks. One major drawback of NeRF is its prohibitive inference time: Rendering a single pixel requires querying the NeRF network hundreds of times. To resolve it, existing efforts mainly attempt to reduce the number of required sampled points. However, the problem of iterative sampling still exists. On the other hand, Neural Light Field (NeLF) presents a more straightforward representation over NeRF in novel view synthesis -- the rendering of a pixel amounts to one single forward pass without ray-marching. In this work, we present a deep residual MLP network (88 layers) to effectively learn the light field. We show the key to successfully learning such a deep NeLF network is to have sufficient data, for which we transfer the knowledge from a pre-trained NeRF model via data distillation. Extensive experiments on both synthetic and real-world scenes show the merits of our method over other counterpart algorithms. On the synthetic scenes, we achieve 26-35x FLOPs reduction (per camera ray) and 28-31x runtime speedup, meanwhile delivering significantly better (1.4-2.8 dB average PSNR improvement) rendering quality than NeRF without any customized implementation tricks.
    FBDNN: Filter Banks and Deep Neural Networks for Portable and Fast Brain-Computer Interfaces. (arXiv:2109.02165v4 [eess.SP] UPDATED)
    Objective: To propose novel SSVEP classification methodologies using deep neural networks (DNNs) and improve performances in single-channel and user-independent brain-computer interfaces (BCIs) with small data lengths. Approach: We propose the utilization of filter banks (creating sub-band components of the EEG signal) in conjunction with DNNs. In this context, we created three different models: a recurrent neural network (FBRNN) analyzing the time domain, a 2D convolutional neural network (FBCNN-2D) processing complex spectrum features and a 3D convolutional neural network (FBCNN-3D) analyzing complex spectrograms, which we introduce in this study as possible input for SSVEP classification. We tested our neural networks on three open datasets and conceived them so as not to require calibration from the final user, simulating a user-independent BCI. Results: The DNNs with the filter banks surpassed the accuracy of similar networks without this preprocessing step by considerable margins, and they outperformed common SSVEP classification methods (SVM and FBCCA) by even higher margins. Conclusion and significance: Filter banks allow different types of deep neural networks to more efficiently analyze the harmonic components of SSVEP. Complex spectrograms carry more information than complex spectrum features and the magnitude spectrum, allowing the FBCNN-3D to surpass the other CNNs. The performances obtained in the challenging classification problems indicates a strong potential for the construction of portable, economical, fast and low-latency BCIs.
    Local Learning Matters: Rethinking Data Heterogeneity in Federated Learning. (arXiv:2111.14213v2 [cs.LG] UPDATED)
    Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices). However, the data distribution among clients is often non-IID in nature, making efficient optimization difficult. To alleviate this issue, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by introducing a variety of proximal terms, some incurring considerable compute and/or memory overheads, to restrain local updates with respect to the global model. Instead, we consider rethinking solutions to data heterogeneity in FL with a focus on local learning generality rather than proximal restriction. To this end, we first present a systematic study informed by second-order indicators to better understand algorithm effectiveness in FL. Interestingly, we find that standard regularization methods are surprisingly strong performers in mitigating data heterogeneity effects. Based on our findings, we further propose a simple and effective method, FedAlign, to overcome data heterogeneity and the pitfalls of previous methods. FedAlign achieves competitive accuracy with state-of-the-art FL methods across a variety of settings while minimizing computation and memory overhead. Code is available at https://github.com/mmendiet/FedAlign
    MBORE: Multi-objective Bayesian Optimisation by Density-Ratio Estimation. (arXiv:2203.16912v1 [cs.LG])
    Optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive. Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions. It combines objective values via scalarisation and builds a Gaussian process (GP) surrogate of the scalarised values. The location which maximises a cheap-to-query acquisition function is chosen as the next location to expensively evaluate. While BO is an effective strategy, the use of GPs is limiting. Their performance decreases as the problem input dimensionality increases, and their computational complexity scales cubically with the amount of data. To address these limitations, we extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting. BORE links the computation of the probability of improvement acquisition function to that of probabilistic classification. This enables the use of state-of-the-art classifiers in a BO-like framework. In this work we present MBORE: multi-objective Bayesian optimisation by density-ratio estimation, and compare it to BO across a range of synthetic and real-world benchmarks. We find that MBORE performs as well as or better than BO on a wide variety of problems, and that it outperforms BO on high-dimensional and real-world problems.
    A Temporal-oriented Broadcast ResNet for COVID-19 Detection. (arXiv:2203.17012v1 [cs.SD])
    Detecting COVID-19 from audio signals, such as breathing and coughing, can be used as a fast and efficient pre-testing method to reduce the virus transmission. Due to the promising results of deep learning networks in modelling time sequences, and since applications to rapidly identify COVID in-the-wild should require low computational effort, we present a temporal-oriented broadcasting residual learning method that achieves efficient computation and high accuracy with a small model size. Based on the EfficientNet architecture, our novel network, named Temporal-oriented ResNet~(TorNet), constitutes of a broadcasting learning block, i.e. the Alternating Broadcast (AB) Block, which contains several Broadcast Residual Blocks (BC ResBlocks) and a convolution layer. With the AB Block, the network obtains useful audio-temporal features and higher level embeddings effectively with much less computation than Recurrent Neural Networks~(RNNs), typically used to model temporal information. TorNet achieves 72.2% Unweighted Average Recall (UAR) on the INTERPSEECH 2021 Computational Paralinguistics Challenge COVID-19 cough Sub-Challenge, by this showing competitive results with a higher computational efficiency than other state-of-the-art alternatives.
    Online Learning for Traffic Routing under Unknown Preferences. (arXiv:2203.17150v1 [cs.LG])
    In transportation networks, users typically choose routes in a decentralized and self-interested manner to minimize their individual travel costs, which, in practice, often results in inefficient overall outcomes for society. As a result, there has been a growing interest in designing road tolling schemes to cope with these efficiency losses and steer users toward a system-efficient traffic pattern. However, the efficacy of road tolling schemes often relies on having access to complete information on users' trip attributes, such as their origin-destination (O-D) travel information and their values of time, which may not be available in practice. Motivated by this practical consideration, we propose an online learning approach to set tolls in a traffic network to drive heterogeneous users with different values of time toward a system-efficient traffic pattern. In particular, we develop a simple yet effective algorithm that adjusts tolls at each time period solely based on the observed aggregate flows on the roads of the network without relying on any additional trip attributes of users, thereby preserving user privacy. In the setting where the O-D pairs and values of time of users are drawn i.i.d. at each period, we show that our approach obtains an expected regret and road capacity violation of $O(\sqrt{T})$, where $T$ is the number of periods over which tolls are updated. Our regret guarantee is relative to an offline oracle that has complete information on users' trip attributes. We further establish a $\Omega(\sqrt{T})$ lower bound on the regret of any algorithm, which establishes that our algorithm is optimal up to constants. Finally, we demonstrate the superior performance of our approach relative to several benchmarks on a real-world transportation network, thereby highlighting its practical applicability.
    Generative Flows with Invertible Attentions. (arXiv:2106.03959v4 [cs.LG] UPDATED)
    Flow-based generative models have shown an excellent ability to explicitly learn the probability density function of data via a sequence of invertible transformations. Yet, learning attentions in generative flows remains understudied, while it has made breakthroughs in other domains. To fill the gap, this paper introduces two types of invertible attention mechanisms, i.e., map-based and transformer-based attentions, for both unconditional and conditional generative flows. The key idea is to exploit a masked scheme of these two attentions to learn long-range data dependencies in the context of generative flows. The masked scheme allows for invertible attention modules with tractable Jacobian determinants, enabling its seamless integration at any positions of the flow-based models. The proposed attention mechanisms lead to more efficient generative flows, due to their capability of modeling the long-term data dependencies. Evaluation on multiple image synthesis tasks shows that the proposed attention flows result in efficient models and compare favorably against the state-of-the-art unconditional and conditional generative flows.
    Few-Shot Class-Incremental Learning by Sampling Multi-Phase Tasks. (arXiv:2203.17030v1 [cs.CV])
    New classes arise frequently in our ever-changing world, e.g., emerging topics in social media and new types of products in e-commerce. A model should recognize new classes and meanwhile maintain discriminability over old classes. Under severe circumstances, only limited novel instances are available to incrementally update the model. The task of recognizing few-shot new classes without forgetting old classes is called few-shot class-incremental learning (FSCIL). In this work, we propose a new paradigm for FSCIL based on meta-learning by LearnIng Multi-phase Incremental Tasks (LIMIT), which synthesizes fake FSCIL tasks from the base dataset. The data format of fake tasks is consistent with the `real' incremental tasks, and we can build a generalizable feature space for the unseen tasks through meta-learning. Besides, LIMIT also constructs a calibration module based on transformer, which calibrates the old class classifiers and new class prototypes into the same scale and fills in the semantic gap. The calibration module also adaptively contextualizes the instance-specific embedding with a set-to-set function. LIMIT efficiently adapts to new classes and meanwhile resists forgetting over old classes. Experiments on three benchmark datasets (CIFAR100, miniImageNet, and CUB200) and large-scale dataset, i.e., ImageNet ILSVRC2012 validate that LIMIT achieves state-of-the-art performance.
    Predicting Winners of the Reality TV Dating Show $\textit{The Bachelor}$ Using Machine Learning Algorithms. (arXiv:2203.16648v1 [cs.LG])
    $\textit{The Bachelor}$ is a reality TV dating show in which a single bachelor selects his wife from a pool of approximately 30 female contestants over eight weeks of filming (American Broadcasting Company 2002). We collected the following data on all 422 contestants that participated in seasons 11 through 25: their Age, Hometown, Career, Race, Week they got their first 1-on-1 date, whether they got the first impression rose, and what "place" they ended up getting. We then trained three machine learning models to predict the ideal characteristics of a successful contestant on $\textit{The Bachelor}$. The three algorithms that we tested were: random forest classification, neural networks, and linear regression. We found consistency across all three models, although the neural network performed the best overall. Our models found that a woman has the highest probability of progressing far on $\textit{The Bachelor}$ if she is: 26 years old, white, from the Northwest, works as an dancer, received a 1-on-1 in week 6, and did not receive the First Impression Rose. Our methodology is broadly applicable to all romantic reality television, and our results will inform future $\textit{The Bachelor}$ production and contestant strategies. While our models were relatively successful, we still encountered high misclassification rates. This may be because: (1) Our training dataset had fewer than 400 points or (2) Our models were too simple to parameterize the complex romantic connections contestants forge over the course of a season.
    When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning. (arXiv:2203.16797v1 [cs.LG])
    Physics-informed machine learning (PIML), referring to the combination of prior knowledge of physics, which is the high level abstraction of natural phenomenons and human behaviours in the long history, with data-driven machine learning models, has emerged as an effective way to mitigate the shortage of training data, to increase models' generalizability and to ensure the physical plausibility of results. In this paper, we survey an abundant number of recent works in PIML and summarize them from three aspects: (1) motivations of PIML, (2) physics knowledge in PIML, (3) methods of physics knowledge integration in PIML. We also discuss current challenges and corresponding research opportunities in PIML.
    A Derivation of Nesterov's Accelerated Gradient Algorithm from Optimal Control Theory. (arXiv:2203.17226v1 [math.OC])
    Nesterov's accelerated gradient algorithm is derived from first principles. The first principles are founded on the recently-developed optimal control theory for optimization. This theory frames an optimization problem as an optimal control problem whose trajectories generate various continuous-time algorithms. The algorithmic trajectories satisfy the necessary conditions for optimal control. The necessary conditions produce a controllable dynamical system for accelerated optimization. Stabilizing this system via a quadratic control Lyapunov function generates an ordinary differential equation. An Euler discretization of the resulting differential equation produces Nesterov's algorithm. In this context, this result solves the purported mystery surrounding the algorithm.
    Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load. (arXiv:2203.16637v1 [cs.SD])
    As a neurophysiological response to threat or adverse conditions, stress can affect cognition, emotion and behaviour with potentially detrimental effects on health in the case of sustained exposure. Since the affective content of speech is inherently modulated by an individual's physical and mental state, a substantial body of research has been devoted to the study of paralinguistic correlates of stress-inducing task load. Historically, voice stress analysis (VSA) has been conducted using conventional digital signal processing (DSP) techniques. Despite the development of modern methods based on deep neural networks (DNNs), accurately detecting stress in speech remains difficult due to the wide variety of stressors and considerable variability in the individual stress perception. To that end, we introduce a set of five datasets for task load detection in speech. The voice recordings were collected as either cognitive or physical stress was induced in the cohort of volunteers, with a cumulative number of more than a hundred speakers. We used the datasets to design and evaluate a novel self-supervised audio representation that leverages the effectiveness of handcrafted features (DSP-based) and the complexity of data-driven DNN representations. Notably, the proposed approach outperformed both extensive handcrafted feature sets and novel DNN-based audio representation learning approaches.
    An unsupervised cluster-level based method for learning node representations of heterogeneous graphs in scientific papers. (arXiv:2203.16751v1 [cs.LG])
    Learning knowledge representation of scientific paper data is a problem to be solved, and how to learn the representation of paper nodes in scientific paper heterogeneous network is the core to solve this problem. This paper proposes an unsupervised cluster-level scientific paper heterogeneous graph node representation learning method (UCHL), aiming at obtaining the representation of nodes (authors, institutions, papers, etc.) in the heterogeneous graph of scientific papers. Based on the heterogeneous graph representation, this paper performs link prediction on the entire heterogeneous graph and obtains the relationship between the edges of the nodes, that is, the relationship between papers and papers. Experiments results show that the proposed method achieves excellent performance on multiple evaluation metrics on real scientific paper datasets.
    Neural Architecture Search for Speech Emotion Recognition. (arXiv:2203.16928v1 [cs.SD])
    Deep neural networks have brought significant advancements to speech emotion recognition (SER). However, the architecture design in SER is mainly based on expert knowledge and empirical (trial-and-error) evaluations, which is time-consuming and resource intensive. In this paper, we propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. To accelerate the candidate architecture optimization, we propose a uniform path dropout strategy to encourage all candidate architecture operations to be equally optimized. Experimental results of two different neural structures on IEMOCAP show that NAS can improve SER performance (54.89\% to 56.28\%) while maintaining model parameter sizes. The proposed dropout strategy also shows superiority over the previous approaches.
    Acoustic-Net: A Novel Neural Network for Sound Localization and Quantification. (arXiv:2203.16988v1 [cs.SD])
    Acoustic source localization has been applied in different fields, such as aeronautics and ocean science, generally using multiple microphones array data to reconstruct the source location. However, the model-based beamforming methods fail to achieve the high-resolution of conventional beamforming maps. Deep neural networks are also appropriate to locate the sound source, but in general, these methods with complex network structures are hard to be recognized by hardware. In this paper, a novel neural network, termed the Acoustic-Net, is proposed to locate and quantify the sound source simply using the original signals. The experiments demonstrate that the proposed method significantly improves the accuracy of sound source prediction and the computing speed, which may generalize well to real data. The code and trained models are available at https://github.com/JoaquinChou/Acoustic-Net.
    A survey of neural models for the automatic analysis of conversation: Towards a better integration of the social sciences. (arXiv:2203.16891v1 [cs.CL])
    Some exciting new approaches to neural architectures for the analysis of conversation have been introduced over the past couple of years. These include neural architectures for detecting emotion, dialogue acts, and sentiment polarity. They take advantage of some of the key attributes of contemporary machine learning, such as recurrent neural networks with attention mechanisms and transformer-based approaches. However, while the architectures themselves are extremely promising, the phenomena they have been applied to to date are but a small part of what makes conversation engaging. In this paper we survey these neural architectures and what they have been applied to. On the basis of the social science literature, we then describe what we believe to be the most fundamental and definitional feature of conversation, which is its co-construction over time by two or more interlocutors. We discuss how neural architectures of the sort surveyed could profitably be applied to these more fundamental aspects of conversation, and what this buys us in terms of a better analysis of conversation and even, in the longer term, a better way of generating conversation for a conversational system.
    Preventing Over-Smoothing for Hypergraph Neural Networks. (arXiv:2203.17159v1 [cs.LG])
    In recent years, hypergraph learning has attracted great attention due to its capacity in representing complex and high-order relationships. However, current neural network approaches designed for hypergraphs are mostly shallow, thus limiting their ability to extract information from high-order neighbors. In this paper, we show both theoretically and empirically, that the performance of hypergraph neural networks does not improve as the number of layers increases, which is known as the over-smoothing problem. To tackle this issue, we develop a new deep hypergraph convolutional network called Deep-HGCN, which can maintain the heterogeneity of node representation in deep layers. Specifically, we prove that a $k$-layer Deep-HGCN simulates a polynomial filter of order $k$ with arbitrary coefficients, which can relieve the problem of over-smoothing. Experimental results on various datasets demonstrate the superior performance of the proposed model comparing to the state-of-the-art hypergraph learning approaches.
    Learning from few examples with nonlinear feature maps. (arXiv:2203.16935v1 [cs.LG])
    In this work we consider the problem of data classification in post-classical settings were the number of training examples consists of mere few data points. We explore the phenomenon and reveal key relationships between dimensionality of AI model's feature space, non-degeneracy of data distributions, and the model's generalisation capabilities. The main thrust of our present analysis is on the influence of nonlinear feature transformations mapping original data into higher- and possibly infinite-dimensional spaces on the resulting model's generalisation capabilities. Subject to appropriate assumptions, we establish new relationships between intrinsic dimensions of the transformed data and the probabilities to learn successfully from few presentations.
    Ransomware Detection using Process Memory. (arXiv:2203.16871v1 [cs.CR])
    Ransomware attacks have increased significantly in recent years, causing great destruction and damage to critical systems and business operations. Attackers are unfailingly finding innovative ways to bypass detection mechanisms, whichencouraged the adoption of artificial intelligence. However, most research summarizes the general features of AI and induces many false positives, as the behavior of ransomware constantly differs to bypass detection. Focusing on the key indicating features of ransomware becomes vital as this guides the investigator to the inner workings and main function of ransomware itself. By utilizing access privileges in process memory, the main function of the ransomware can be detected more easily and accurately. Furthermore, new signatures and fingerprints of ransomware families can be identified to classify novel ransomware attacks correctly. The current research used the process memory access privileges of the different memory regions of the behavior of an executable to quickly determine its intent before serious harm can occur. To achieve this aim, several well-known machine learning algorithms were explored with an accuracy range of 81.38 to 96.28 percents. The study thus confirms the feasibility of utilizing process memory as a detection mechanism for ransomware.
    An Optimal Control Method to Compute the Most Likely Transition Path for Stochastic Dynamical Systems with Jumps. (arXiv:2203.16874v1 [math.NA])
    Many complex real world phenomena exhibit abrupt, intermittent or jumping behaviors, which are more suitable to be described by stochastic differential equations under non-Gaussian L\'evy noise. Among these complex phenomena, the most likely transition paths between metastable states are important since these rare events may have high impact in certain scenarios. Based on the large deviation principle, the most likely transition path could be treated as the minimizer of the rate function upon paths that connect two points. One of the challenges to calculate the most likely transition path for stochastic dynamical systems under non-Gaussian L\'evy noise is that the associated rate function can not be explicitly expressed by paths. For this reason, we formulate an optimal control problem to obtain the optimal state as the most likely transition path. We then develop a neural network method to solve this issue. Several experiments are investigated for both Gaussian and non-Gaussian cases.
    An analytic theory for the dynamics of wide quantum neural networks. (arXiv:2203.16711v1 [quant-ph])
    Parametrized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.
    JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech. (arXiv:2203.16852v1 [eess.AS])
    In neural text-to-speech (TTS), two-stage system or a cascade of separately learned models have shown synthesis quality close to human speech. For example, FastSpeech2 transforms an input text to a mel-spectrogram and then HiFi-GAN generates a raw waveform from a mel-spectogram where they are called an acoustic feature generator and a neural vocoder respectively. However, their training pipeline is somewhat cumbersome in that it requires a fine-tuning and an accurate speech-text alignment for optimal performance. In this work, we present end-to-end text-to-speech (E2E-TTS) model which has a simplified training pipeline and outperforms a cascade of separately learned models. Specifically, our proposed model is jointly trained FastSpeech2 and HiFi-GAN with an alignment module. Since there is no acoustic feature mismatch between training and inference, it does not requires fine-tuning. Furthermore, we remove dependency on an external speech-text alignment tool by adopting an alignment learning objective in our joint training framework. Experiments on LJSpeech corpus shows that the proposed model outperforms publicly available, state-of-the-art implementations of ESPNet2-TTS on subjective evaluation (MOS) and some objective evaluations.
    Data-driven Prediction of Relevant Scenarios for Robust Optimization. (arXiv:2203.16642v1 [math.OC])
    In this work we study robust one- and two-stage problems with discrete uncertainty sets which are known to be hard to solve even if the underlying deterministic problem is easy. Popular solution methods iteratively generate scenario constraints and possibly second-stage variables. This way, by solving a sequence of smaller problems, it is often possible to avoid the complexity of considering all scenarios simultaneously. A key ingredient for the performance of the iterative methods is a good selection of start scenarios. In this paper we propose a data-driven heuristic to seed the iterative solution method with a set of starting scenarios that provide a strong lower bound early in the process, and result in considerably smaller overall solution times compared to other benchmark methods. Our heuristic learns the relevance of a scenario by extracting information from training data based on a combined similarity measure between robust problem instances and single scenarios. Our experiments show that predicting even a small number of good start scenarios by our method can considerably reduce the computation time of the iterative methods.
    Recovering models of open quantum systems from data via polynomial optimization: Towards globally convergent quantum system identification. (arXiv:2203.17164v1 [quant-ph])
    Current quantum devices suffer imperfections as a result of fabrication, as well as noise and dissipation as a result of coupling to their immediate environments. Because of this, it is often difficult to obtain accurate models of their dynamics from first principles. An alternative is to extract such models from time-series measurements of their behavior. Here, we formulate this system-identification problem as a polynomial optimization problem. Recent advances in optimization have provided globally convergent solvers for this class of problems, which using our formulation prove estimates of the Kraus map or the Lindblad equation. We include an overview of the state-of-the-art algorithms, bounds, and convergence rates, and illustrate the use of this approach to modeling open quantum systems.
    SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. (arXiv:2203.16749v1 [eess.AS])
    Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.
    Distributional Robust Batch Contextual Bandits. (arXiv:2006.05630v4 [cs.LG] UPDATED)
    Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
    Calibrating constitutive models with full-field data via physics informed neural networks. (arXiv:2203.16577v1 [cs.LG])
    The calibration of solid constitutive models with full-field experimental data is a long-standing challenge, especially in materials which undergo large deformation. In this paper, we propose a physics-informed deep-learning framework for the discovery of constitutive model parameterizations given full-field displacement data and global force-displacement data. Contrary to the majority of recent literature in this field, we work with the weak form of the governing equations rather than the strong form to impose physical constraints upon the neural network predictions. The approach presented in this paper is computationally efficient, suitable for irregular geometric domains, and readily ingests displacement data without the need for interpolation onto a computational grid. A selection of canonical hyperelastic materials models suitable for different material classes is considered including the Neo-Hookean, Gent, and Blatz-Ko constitutive models as exemplars for general hyperelastic behavior, polymer behavior with lock-up, and compressible foam behavior respectively. We demonstrate that physics informed machine learning is an enabling technology and may shift the paradigm of how full-field experimental data is utilized to calibrate constitutive models under finite deformations.  ( 2 min )
    Mind the gap: Challenges of deep learning approaches to Theory of Mind. (arXiv:2203.16540v1 [cs.LG])
    Theory of Mind is an essential ability of humans to infer the mental states of others. Here we provide a coherent summary of the potential, current progress, and problems of deep learning approaches to Theory of Mind. We highlight that many current findings can be explained through shortcuts. These shortcuts arise because the tasks used to investigate Theory of Mind in deep learning systems have been too narrow. Thus, we encourage researchers to investigate Theory of Mind in complex open-ended environments. Furthermore, to inspire future deep learning systems we provide a concise overview of prior work done in humans. We further argue that when studying Theory of Mind with deep learning, the research's main focus and contribution ought to be opening up the network's representations. We recommend researchers use tools from the field of interpretability of AI to study the relationship between different network components and aspects of Theory of Mind.  ( 2 min )
    A Fast and Convergent Proximal Algorithm for Regularized Nonconvex and Nonsmooth Bi-level Optimization. (arXiv:2203.16615v1 [cs.LG])
    Many important machine learning applications involve regularized nonconvex bi-level optimization. However, the existing gradient-based bi-level optimization algorithms cannot handle nonconvex or nonsmooth regularizers, and they suffer from a high computation complexity in nonconvex bi-level optimization. In this work, we study a proximal gradient-type algorithm that adopts the approximate implicit differentiation (AID) scheme for nonconvex bi-level optimization with possibly nonconvex and nonsmooth regularizers. In particular, the algorithm applies the Nesterov's momentum to accelerate the computation of the implicit gradient involved in AID. We provide a comprehensive analysis of the global convergence properties of this algorithm through identifying its intrinsic potential function. In particular, we formally establish the convergence of the model parameters to a critical point of the bi-level problem, and obtain an improved computation complexity $\mathcal{O}(\kappa^{3.5}\epsilon^{-2})$ over the state-of-the-art result. Moreover, we analyze the asymptotic convergence rates of this algorithm under a class of local nonconvex geometries characterized by a {\L}ojasiewicz-type gradient inequality. Experiment on hyper-parameter optimization demonstrates the effectiveness of our algorithm.  ( 2 min )
    Towards Differential Relational Privacy and its use in Question Answering. (arXiv:2203.16701v1 [cs.LG])
    Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference is most pronounced when the data distribution is long-tailed, with many queries having only few training examples: Impeding general memorization prevents effective learning, while impeding only relational memorization still allows learning general properties of the underlying concepts. We formalize the notion of Relational Privacy (RP) and, inspired by Differential Privacy (DP), we provide a possible definition of Differential Relational Privacy (DrP). These notions can be used to describe and compute bounds on the amount of RM in a trained model. We illustrate Relational Privacy concepts in experiments with large-scale models for Question Answering.  ( 2 min )
    Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracle. (arXiv:2203.16668v1 [cs.LG])
    Many popular contextual bandit algorithms estimate reward models to inform decision making. However, true rewards can contain action-independent redundancies that are not relevant for decision making and only increase the statistical complexity of accurate estimation. It is sufficient and more data-efficient to estimate the simplest function that explains the reward differences between actions, that is, the heterogeneous treatment effect, commonly understood to be more structured and simpler than the reward. Motivated by this observation, building on recent work on oracle-based algorithms, we design a statistically optimal and computationally efficient algorithm using heterogeneous treatment effect estimation oracles. Our results provide the first universal reduction of contextual bandits to a general-purpose heterogeneous treatment effect estimation method. We show that our approach is more robust to model misspecification than reward estimation methods based on squared error regression oracles. Experimentally, we show the benefits of heterogeneous treatment effect estimation in contextual bandits over reward estimation.  ( 2 min )
    Monte Carlo Tree Search based Hybrid Optimization of Variational Quantum Circuits. (arXiv:2203.16707v1 [quant-ph])
    Variational quantum algorithms stand at the forefront of simulations on near-term and future fault-tolerant quantum devices. While most variational quantum algorithms involve only continuous optimization variables, the representational power of the variational ansatz can sometimes be significantly enhanced by adding certain discrete optimization variables, as is exemplified by the generalized quantum approximate optimization algorithm (QAOA). However, the hybrid discrete-continuous optimization problem in the generalized QAOA poses a challenge to the optimization. We propose a new algorithm called MCTS-QAOA, which combines a Monte Carlo tree search method with an improved natural policy gradient solver to optimize the discrete and continuous variables in the quantum circuit, respectively. We find that MCTS-QAOA has excellent noise-resilience properties and outperforms prior algorithms in challenging instances of the generalized QAOA.  ( 2 min )
    Recent improvements of ASR models in the face of adversarial attacks. (arXiv:2203.16536v1 [cs.CR])
    Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks. However recent research has pointed out differences between attacks and defenses on ASR models compared to image models. Improving the robustness of ASR models requires a paradigm shift from evaluating attacks on one or a few models to a systemic approach in evaluation. We lay the ground for such research by evaluating on various architectures a representative set of adversarial attacks: targeted and untargeted, optimization and speech processing-based, white-box, black-box and targeted attacks. Our results show that the relative strengths of different attack algorithms vary considerably when changing the model architecture, and that the results of some attacks are not to be blindly trusted. They also indicate that training choices such as self-supervised pretraining can significantly impact robustness by enabling transferable perturbations. We release our source code as a package that should help future research in evaluating their attacks and defenses.  ( 2 min )
    Parallel framework for Dynamic Domain Decomposition of Data Assimilation problems a case study on Kalman Filter algorithm. (arXiv:2203.16535v1 [cs.LG])
    We focus on Partial Differential Equation (PDE) based Data Assimilatio problems (DA) solved by means of variational approaches and Kalman filter algorithm. Recently, we presented a Domain Decomposition framework (we call it DD-DA, for short) performing a decomposition of the whole physical domain along space and time directions, and joining the idea of Schwarz' methods and parallel in time approaches. For effective parallelization of DD-DA algorithms, the computational load assigned to subdomains must be equally distributed. Usually computational cost is proportional to the amount of data entities assigned to partitions. Good quality partitioning also requires the volume of communication during calculation to be kept at its minimum. In order to deal with DD-DA problems where the observations are nonuniformly distributed and general sparse, in the present work we employ a parallel load balancing algorithm based on adaptive and dynamic defining of boundaries of DD -- which is aimed to balance workload according to data location. We call it DyDD. As the numerical model underlying DA problems arising from the so-called discretize-then-optimize approach is the constrained least square model (CLS), we will use CLS as a reference state estimation problem and we validate DyDD on different scenarios.  ( 2 min )
    Active Learning for Computationally Efficient Distribution of Binary Evolution Simulations. (arXiv:2203.16683v1 [astro-ph.SR])
    Binary stars undergo a variety of interactions and evolutionary phases, critical for predicting and explaining observed properties. Binary population synthesis with full stellar-structure and evolution simulations are computationally expensive requiring a large number of mass-transfer sequences. The recently developed binary population synthesis code POSYDON incorporates grids of MESA binary star simulations which are then interpolated to model large-scale populations of massive binaries. The traditional method of computing a high-density rectilinear grid of simulations is not scalable for higher-dimension grids, accounting for a range of metallicities, rotation, and eccentricity. We present a new active learning algorithm, psy-cris, which uses machine learning in the data-gathering process to adaptively and iteratively select targeted simulations to run, resulting in a custom, high-performance training set. We test psy-cris on a toy problem and find the resulting training sets require fewer simulations for accurate classification and regression than either regular or randomly sampled grids. We further apply psy-cris to the target problem of building a dynamic grid of MESA simulations, and we demonstrate that, even without fine tuning, a simulation set of only $\sim 1/4$ the size of a rectilinear grid is sufficient to achieve the same classification accuracy. We anticipate further gains when algorithmic parameters are optimized for the targeted application. We find that optimizing for classification only may lead to performance losses in regression, and vice versa. Lowering the computational cost of producing grids will enable future versions of POSYDON to cover more input parameters while preserving interpolation accuracies.  ( 2 min )
    Physics-constrained Unsupervised Learning of Partial Differential Equations using Meshes. (arXiv:2203.16628v1 [cs.LG])
    Enhancing neural networks with knowledge of physical equations has become an efficient way of solving various physics problems, from fluid flow to electromagnetism. Graph neural networks show promise in accurately representing irregularly meshed objects and learning their dynamics, but have so far required supervision through large datasets. In this work, we represent meshes naturally as graphs, process these using Graph Networks, and formulate our physics-based loss to provide an unsupervised learning framework for partial differential equations (PDE). We quantitatively compare our results to a classical numerical PDE solver, and show that our computationally efficient approach can be used as an interactive PDE solver that is adjusting boundary conditions in real-time and remains sufficiently close to the baseline solution. Our inherently differentiable framework will enable the application of PDE solvers in interactive settings, such as model-based control of soft-body deformations, or in gradient-based optimization methods that require a fully differentiable pipeline.  ( 2 min )
    Federated Learning for the Classification of Tumor Infiltrating Lymphocytes. (arXiv:2203.16622v1 [eess.IV])
    We evaluate the performance of federated learning (FL) in developing deep learning models for analysis of digitized tissue sections. A classification application was considered as the example use case, on quantifiying the distribution of tumor infiltrating lymphocytes within whole slide images (WSIs). A deep learning classification model was trained using 50*50 square micron patches extracted from the WSIs. We simulated a FL environment in which a dataset, generated from WSIs of cancer from numerous anatomical sites available by The Cancer Genome Atlas repository, is partitioned in 8 different nodes. Our results show that the model trained with the federated training approach achieves similar performance, both quantitatively and qualitatively, to that of a model trained with all the training data pooled at a centralized location. Our study shows that FL has tremendous potential for enabling development of more robust and accurate models for histopathology image analysis without having to collect large and diverse training data at a single location.  ( 2 min )
    Efficient Localness Transformer for Smart Sensor-Based Energy Disaggregation. (arXiv:2203.16537v1 [cs.LG])
    Modern smart sensor-based energy management systems leverage non-intrusive load monitoring (NILM) to predict and optimize appliance load distribution in real-time. NILM, or energy disaggregation, refers to the decomposition of electricity usage conditioned on the aggregated power signals (i.e., smart sensor on the main channel). Based on real-time appliance power prediction using sensory technology, energy disaggregation has great potential to increase electricity efficiency and reduce energy expenditure. With the introduction of transformer models, NILM has achieved significant improvements in predicting device power readings. Nevertheless, transformers are less efficient due to O(l^2) complexity w.r.t. sequence length l. Moreover, transformers can fail to capture local signal patterns in sequence-to-point settings due to the lack of inductive bias in local context. In this work, we propose an efficient localness transformer for non-intrusive load monitoring (ELTransformer). Specifically, we leverage normalization functions and switch the order of matrix multiplication to approximate self-attention and reduce computational complexity. Additionally, we introduce localness modeling with sparse local attention heads and relative position encodings to enhance the model capacity in extracting short-term local patterns. To the best of our knowledge, ELTransformer is the first NILM model that addresses computational complexity and localness modeling in NILM. With extensive experiments and quantitative analyses, we demonstrate the efficiency and effectiveness of the the proposed ELTransformer with considerable improvements compared to state-of-the-art baselines.  ( 2 min )
    Generation of Speaker Representations Using Heterogeneous Training Batch Assembly. (arXiv:2203.16646v1 [cs.SD])
    In traditional speaker diarization systems, a well-trained speaker model is a key component to extract representations from consecutive and partially overlapping segments in a long speech session. To be more consistent with the back-end segmentation and clustering, we propose a new CNN-based speaker modeling scheme, which takes into account the heterogeneity of the speakers in each training segment and batch. We randomly and synthetically augment the training data into a set of segments, each of which contains more than one speaker and some overlapping parts. A soft label is imposed on each segment based on its speaker occupation ratio, and the standard cross entropy loss is implemented in model training. In this way, the speaker model should have the ability to generate a geometrically meaningful embedding for each multi-speaker segment. Experimental results show that our system is superior to the baseline system using x-vectors in two speaker diarization tasks. In the CALLHOME task trained on the NIST SRE and Switchboard datasets, our system achieves a relative reduction of 12.93% in DER. In Track 2 of CHiME-6, our system provides 13.24%, 12.60%, and 5.65% relative reductions in DER, JER, and WER, respectively.  ( 2 min )
    Graph Refinement for Coreference Resolution. (arXiv:2203.16574v1 [cs.CL])
    The state-of-the-art models for coreference resolution are based on independent mention pair-wise decisions. We propose a modelling approach that learns coreference at the document-level and takes global decisions. For this purpose, we model coreference links in a graph structure where the nodes are tokens in the text, and the edges represent the relationship between them. Our model predicts the graph in a non-autoregressive manner, then iteratively refines it based on previous predictions, allowing global dependencies between decisions. The experimental results show improvements over various baselines, reinforcing the hypothesis that document-level information improves conference resolution.  ( 2 min )
    Challenges in leveraging GANs for few-shot data augmentation. (arXiv:2203.16662v1 [stat.ML])
    In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues related to the difficulty of training such generative models under a purely supervised regime with very few examples, as well as issues regarding the evaluation protocols of existing works. We also find that in this regime, classification accuracy is highly sensitive to how the classes of the dataset are randomly split. Therefore, we propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems.  ( 2 min )
    Constrained Few-shot Class-incremental Learning. (arXiv:2203.16588v1 [cs.CV])
    Continually learning new classes from fresh data without forgetting previous knowledge of old classes is a very challenging research problem. Moreover, it is imperative that such learning must respect certain memory and computational constraints such as (i) training samples are limited to only a few per class, (ii) the computational cost of learning a novel class remains constant, and (iii) the memory footprint of the model grows at most linearly with the number of classes observed. To meet the above constraints, we propose C-FSCIL, which is architecturally composed of a frozen meta-learned feature extractor, a trainable fixed-size fully connected layer, and a rewritable dynamically growing memory that stores as many vectors as the number of encountered classes. C-FSCIL provides three update modes that offer a trade-off between accuracy and compute-memory cost of learning novel classes. C-FSCIL exploits hyperdimensional embedding that allows to continually express many more classes than the fixed dimensions in the vector space, with minimal interference. The quality of class vector representations is further improved by aligning them quasi-orthogonally to each other by means of novel loss functions. Experiments on the CIFAR100, miniImageNet, and Omniglot datasets show that C-FSCIL outperforms the baselines with remarkable accuracy and compression. It also scales up to the largest problem size ever tried in this few-shot setting by learning 423 novel classes on top of 1200 base classes with less than 1.6% accuracy drop. Our code is available at https://github.com/IBM/constrained-FSCIL.  ( 2 min )
    Identification of diffracted vortex beams at different propagation distances using deep learning. (arXiv:2203.16539v1 [cs.LG])
    Orbital angular momentum of light is regarded as a valuable resource in quantum technology, especially in quantum communication and quantum sensing and ranging. However, the OAM state of light is susceptible to undesirable experimental conditions such as propagation distance and phase distortions, which hinders the potential for the realistic implementation of relevant technologies. In this article, we exploit an enhanced deep learning neural network to identify different OAM modes of light at multiple propagation distances with phase distortions. Specifically, our trained deep learning neural network can efficiently identify the vortex beam's topological charge and propagation distance with 97% accuracy. Our technique has important implications for OAM based communication and sensing protocols.  ( 2 min )
    FALCON: Fast Visual Concept Learning by Integrating Images, Linguistic descriptions, and Conceptual Relations. (arXiv:2203.16639v1 [cs.CV])
    We present a meta-learning framework for learning new visual concepts quickly, from just one or a few examples, guided by multiple naturally occurring data streams: simultaneously looking at images, reading sentences that describe the objects in the scene, and interpreting supplemental sentences that relate the novel concept with other concepts. The learned concepts support downstream applications, such as answering questions by reasoning about unseen images. Our model, namely FALCON, represents individual visual concepts, such as colors and shapes, as axis-aligned boxes in a high-dimensional space (the "box embedding space"). Given an input image and its paired sentence, our model first resolves the referential expression in the sentence and associates the novel concept with particular objects in the scene. Next, our model interprets supplemental sentences to relate the novel concept with other known concepts, such as "X has property Y" or "X is a kind of Y". Finally, it infers an optimal box embedding for the novel concept that jointly 1) maximizes the likelihood of the observed instances in the image, and 2) satisfies the relationships between the novel concepts and the known ones. We demonstrate the effectiveness of our model on both synthetic and real-world datasets.  ( 2 min )
    Transformer Language Models without Positional Encodings Still Learn Positional Information. (arXiv:2203.16634v1 [cs.CL])
    Transformers typically require some form of positional encoding, such as positional embeddings, to process natural language sequences. Surprisingly, we find that transformer language models without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position.  ( 2 min )
    Machine Learning Approaches for Non-Intrusive Home Absence Detection Based on Appliance Electrical Use. (arXiv:2203.16538v1 [cs.LG])
    Home absence detection is an emerging field on smart home installations. Identifying whether or not the residents of the house are present, is important in numerous scenarios. Possible scenarios include but are not limited to: elderly people living alone, people suffering from dementia, home quarantine. The majority of published papers focus on either pressure / door sensors or cameras in order to detect outing events. Although the aforementioned approaches provide solid results, they are intrusive and require modifications for sensor placement. In our work, appliance electrical use is investigated as a means for detecting the presence or absence of residents. The energy use is the result of power disaggregation, a non intrusive / non invasive sensing method. Since a dataset providing energy data and ground truth for home absence is not available, artificial outing events were introduced on the UK-DALE dataset, a well known dataset for Non Intrusive Load Monitoring (NILM). Several machine learning algorithms were evaluated using the generated dataset. Benchmark results have shown that home absence detection using appliance power consumption is feasible.  ( 2 min )
  • Open

    A Single-Timescale Method for Stochastic Bilevel Optimization. (arXiv:2102.04671v4 [math.OC] UPDATED)
    Stochastic bilevel optimization generalizes the classic stochastic optimization from the minimization of a single objective to the minimization of an objective function that depends the solution of another optimization problem. Recently, stochastic bilevel optimization is regaining popularity in emerging machine learning applications such as hyper-parameter optimization and model-agnostic meta learning. To solve this class of stochastic optimization problems, existing methods require either double-loop or two-timescale updates, which are sometimes less efficient. This paper develops a new optimization method for a class of stochastic bilevel problems that we term Single-Timescale stochAstic BiLevEl optimization (STABLE) method. STABLE runs in a single loop fashion, and uses a single-timescale update with a fixed batch size. To achieve an $\epsilon$-stationary point of the bilevel problem, STABLE requires ${\cal O}(\epsilon^{-2})$ samples in total; and to achieve an $\epsilon$-optimal solution in the strongly convex case, STABLE requires ${\cal O}(\epsilon^{-1})$ samples. To the best of our knowledge, this is the first bilevel optimization algorithm achieving the same order of sample complexity as the stochastic gradient descent method for the single-level stochastic optimization.
    Flat-topped Probability Density Functions for Mixture Models. (arXiv:2203.17027v1 [cs.LG])
    This paper investigates probability density functions (PDFs) that are continuous everywhere, nearly uniform around the mode of distribution, and adaptable to a variety of distribution shapes ranging from bell-shaped to rectangular. From the viewpoint of computational tractability, the PDF based on the Fermi-Dirac or logistic function is advantageous in estimating its shape parameters. The most appropriate PDF for $n$-variate distribution is of the form: $p\left(\mathbf{x}\right)\propto\left[\cosh\left(\left[\left(\mathbf{x}-\mathbf{m}\right)^{\mathsf{T}}\boldsymbol{\Sigma}^{-1}\left(\mathbf{x}-\mathbf{m}\right)\right]^{n/2}\right)+\cosh\left(r^{n}\right)\right]^{-1}$ where $\mathbf{x},\mathbf{m}\in\mathbb{R}^{n}$, $\boldsymbol{\Sigma}$ is an $n\times n$ positive definite matrix, and $r>0$ is a shape parameter. The flat-topped PDFs can be used as a component of mixture models in machine learning to improve goodness of fit and make a model as simple as possible.
    Equivariant Diffusion for Molecule Generation in 3D. (arXiv:2203.17003v1 [cs.LG])
    This work introduces a diffusion model for molecule generation in 3D that is equivariant to Euclidean transformations. Our E(3) Equivariant Diffusion Model (EDM) learns to denoise a diffusion process with an equivariant network that jointly operates on both continuous (atom coordinates) and categorical features (atom types). In addition, we provide a probabilistic analysis which admits likelihood computation of molecules using our model. Experimentally, the proposed method significantly outperforms previous 3D molecular generative methods regarding the quality of generated samples and efficiency at training time.
    Model-based Reinforcement Learning: A Survey. (arXiv:2006.16712v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as Markov Decision Process (MDP) optimization, is a important challenge in artificial intelligence. Two key approaches to this problem are reinforcement learning (RL) and planning. This paper presents a survey of the integration of both fields, better known as model-based reinforcement learning. Model-based RL has two main steps. First, we systematically cover approaches to dynamics model learning, including challenges like dealing with stochasticity, uncertainty, partial observability, and temporal abstraction. Second, we present a systematic categorization of planning-learning integration, including aspects like: where to start planning, what budgets to allocate to planning and real data collection, how to plan, and how to integrate planning in the learning and acting loop. After these two sections, we also discuss implicit model-based RL as an end-to-end alternative for model learning and planning, and we cover the potential benefits of model-based RL. Along the way, the survey also draws connections to several related RL fields, like hierarchical RL and transfer learning. Altogether, the survey presents a broad conceptual overview of the combination of planning and learning for MDP optimization.
    STICC: A multivariate spatial clustering method for repeated geographic pattern discovery with consideration of spatial contiguity. (arXiv:2203.09611v2 [cs.LG] UPDATED)
    Spatial clustering has been widely used for spatial data mining and knowledge discovery. An ideal multivariate spatial clustering should consider both spatial contiguity and aspatial attributes. Existing spatial clustering approaches may face challenges for discovering repeated geographic patterns with spatial contiguity maintained. In this paper, we propose a Spatial Toeplitz Inverse Covariance-Based Clustering (STICC) method that considers both attributes and spatial relationships of geographic objects for multivariate spatial clustering. A subregion is created for each geographic object serving as the basic unit when performing clustering. A Markov random field is then constructed to characterize the attribute dependencies of subregions. Using a spatial consistency strategy, nearby objects are encouraged to belong to the same cluster. To test the performance of the proposed STICC algorithm, we apply it in two use cases. The comparison results with several baseline methods show that the STICC outperforms others significantly in terms of adjusted rand index and macro-F1 score. Join count statistics is also calculated and shows that the spatial contiguity is well preserved by STICC. Such a spatial clustering method may benefit various applications in the fields of geography, remote sensing, transportation, and urban planning, etc.
    An analytic theory for the dynamics of wide quantum neural networks. (arXiv:2203.16711v1 [quant-ph])
    Parametrized quantum circuits can be used as quantum neural networks and have the potential to outperform their classical counterparts when trained for addressing learning problems. To date, much of the results on their performance on practical problems are heuristic in nature. In particular, the convergence rate for the training of quantum neural networks is not fully understood. Here, we analyze the dynamics of gradient descent for the training error of a class of variational quantum machine learning models. We define wide quantum neural networks as parameterized quantum circuits in the limit of a large number of qubits and variational parameters. We then find a simple analytic formula that captures the average behavior of their loss function and discuss the consequences of our findings. For example, for random quantum circuits, we predict and characterize an exponential decay of the residual training error as a function of the parameters of the system. We finally validate our analytic results with numerical experiments.
    Neural Q-learning for solving elliptic PDEs. (arXiv:2203.17128v1 [math.NA])
    Solving high-dimensional partial differential equations (PDEs) is a major challenge in scientific computing. We develop a new numerical method for solving elliptic-type PDEs by adapting the Q-learning algorithm in reinforcement learning. Our "Q-PDE" algorithm is mesh-free and therefore has the potential to overcome the curse of dimensionality. Using a neural tangent kernel (NTK) approach, we prove that the neural network approximator for the PDE solution, trained with the Q-PDE algorithm, converges to the trajectory of an infinite-dimensional ordinary differential equation (ODE) as the number of hidden units $\rightarrow \infty$. For monotone PDE (i.e. those given by monotone operators, which may be nonlinear), despite the lack of a spectral gap in the NTK, we then prove that the limit neural network, which satisfies the infinite-dimensional ODE, converges in $L^2$ to the PDE solution as the training time $\rightarrow \infty$. More generally, we can prove that any fixed point of the wide-network limit for the Q-PDE algorithm is a solution of the PDE (not necessarily under the monotone condition). The numerical performance of the Q-PDE algorithm is studied for several elliptic PDEs.
    Graph Node-Feature Convolution for Representation Learning. (arXiv:1812.00086v2 [cs.LG] UPDATED)
    Graph convolutional network (GCN) is an emerging neural network approach. It learns new representation of a node by aggregating feature vectors of all neighbors in the aggregation process without considering whether the neighbors or features are useful or not. Recent methods have improved solutions by sampling a fixed size set of neighbors, or assigning different weights to different neighbors in the aggregation process, but features within a feature vector are still treated equally in the aggregation process. In this paper, we introduce a new convolution operation on regular size feature maps constructed from features of a fixed node bandwidth via sampling to get the first-level node representation, which is then passed to a standard GCN to learn the second-level node representation. Experiments show that our method outperforms competing methods in semi-supervised node classification tasks. Furthermore, our method opens new doors for exploring new GCN architectures, particularly deeper GCN models.
    MBORE: Multi-objective Bayesian Optimisation by Density-Ratio Estimation. (arXiv:2203.16912v1 [cs.LG])
    Optimisation problems often have multiple conflicting objectives that can be computationally and/or financially expensive. Mono-surrogate Bayesian optimisation (BO) is a popular model-based approach for optimising such black-box functions. It combines objective values via scalarisation and builds a Gaussian process (GP) surrogate of the scalarised values. The location which maximises a cheap-to-query acquisition function is chosen as the next location to expensively evaluate. While BO is an effective strategy, the use of GPs is limiting. Their performance decreases as the problem input dimensionality increases, and their computational complexity scales cubically with the amount of data. To address these limitations, we extend previous work on BO by density-ratio estimation (BORE) to the multi-objective setting. BORE links the computation of the probability of improvement acquisition function to that of probabilistic classification. This enables the use of state-of-the-art classifiers in a BO-like framework. In this work we present MBORE: multi-objective Bayesian optimisation by density-ratio estimation, and compare it to BO across a range of synthetic and real-world benchmarks. We find that MBORE performs as well as or better than BO on a wide variety of problems, and that it outperforms BO on high-dimensional and real-world problems.
    Learning from many trajectories. (arXiv:2203.17193v1 [cs.LG])
    We initiate a study of supervised learning from many independent sequences ("trajectories") of non-independent covariates, reflecting tasks in sequence modeling, control, and reinforcement learning. Conceptually, our multi-trajectory setup sits between two traditional settings in statistical learning theory: learning from independent examples and learning from a single auto-correlated sequence. Our conditions for efficient learning generalize the former setting--trajectories must be non-degenerate in ways that extend standard requirements for independent examples. They do not require that trajectories be ergodic, long, nor strictly stable. For linear least-squares regression, given $n$-dimensional examples produced by $m$ trajectories, each of length $T$, we observe a notable change in statistical efficiency as the number of trajectories increases from a few (namely $m \lesssim n$) to many (namely $m \gtrsim n$). Specifically, we establish that the worst-case error rate this problem is $\Theta(n / m T)$ whenever $m \gtrsim n$. Meanwhile, when $m \lesssim n$, we establish a (sharp) lower bound of $\Omega(n^2 / m^2 T)$ on the worst-case error rate, realized by a simple, marginally unstable linear dynamical system. A key upshot is that, in domains where trajectories regularly reset, the error rate eventually behaves as if all of the examples were independent altogether, drawn from their marginals. As a corollary of our analysis, we also improve guarantees for the linear system identification problem.
    SpecGrad: Diffusion Probabilistic Model based Neural Vocoder with Adaptive Noise Spectral Shaping. (arXiv:2203.16749v1 [eess.AS])
    Neural vocoder using denoising diffusion probabilistic model (DDPM) has been improved by adaptation of the diffusion noise distribution to given acoustic features. In this study, we propose SpecGrad that adapts the diffusion noise so that its time-varying spectral envelope becomes close to the conditioning log-mel spectrogram. This adaptation by time-varying filtering improves the sound quality especially in the high-frequency bands. It is processed in the time-frequency domain to keep the computational cost almost the same as the conventional DDPM-based neural vocoders. Experimental results showed that SpecGrad generates higher-fidelity speech waveform than conventional DDPM-based neural vocoders in both analysis-synthesis and speech enhancement scenarios. Audio demos are available at wavegrad.github.io/specgrad/.
    When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently?. (arXiv:2110.04184v2 [cs.LG] UPDATED)
    Multi-agent reinforcement learning has made substantial empirical progresses in solving games with a large number of players. However, theoretically, the best known sample complexity for finding a Nash equilibrium in general-sum games scales exponentially in the number of players due to the size of the joint action space, and there is a matching exponential lower bound. This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games with $H$ steps, $S$ states, and $A_i$ actions per player. First, we design algorithms for learning an $\epsilon$-Coarse Correlated Equilibrium (CCE) in $\widetilde{\mathcal{O}}(H^5S\max_{i\le m} A_i / \epsilon^2)$ episodes, and an $\epsilon$-Correlated Equilibrium (CE) in $\widetilde{\mathcal{O}}(H^6S\max_{i\le m} A_i^2 / \epsilon^2)$ episodes. This is the first line of results for learning CCE and CE with sample complexities polynomial in $\max_{i\le m} A_i$. Our algorithm for learning CE integrates an adversarial bandit subroutine which minimizes a weighted swap regret, along with several novel designs in the outer loop. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $\epsilon$-approximate Nash equilibrium within $\widetilde{\mathcal{O}}(S\sum_{i\le m} A_i / \epsilon^3)$ episodes (when only highlighting the dependence on $S$, $A_i$, and $\epsilon$), which only depends linearly in $\sum_{i\le m} A_i$ and significantly improves over existing efficient algorithm in the $\epsilon$ dependence. Overall, our results shed light on what equilibria or structural assumptions on the game may enable sample-efficient learning with many players.
    When Physics Meets Machine Learning: A Survey of Physics-Informed Machine Learning. (arXiv:2203.16797v1 [cs.LG])
    Physics-informed machine learning (PIML), referring to the combination of prior knowledge of physics, which is the high level abstraction of natural phenomenons and human behaviours in the long history, with data-driven machine learning models, has emerged as an effective way to mitigate the shortage of training data, to increase models' generalizability and to ensure the physical plausibility of results. In this paper, we survey an abundant number of recent works in PIML and summarize them from three aspects: (1) motivations of PIML, (2) physics knowledge in PIML, (3) methods of physics knowledge integration in PIML. We also discuss current challenges and corresponding research opportunities in PIML.
    Distributional Robust Batch Contextual Bandits. (arXiv:2006.05630v4 [cs.LG] UPDATED)
    Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
    A statistical framework for efficient out of distribution detection in deep neural networks. (arXiv:2102.12967v3 [cs.LG] UPDATED)
    Background. Commonly, Deep Neural Networks (DNNs) generalize well on samples drawn from a distribution similar to that of the training set. However, DNNs' predictions are brittle and unreliable when the test samples are drawn from a dissimilar distribution. This is a major concern for deployment in real-world applications, where such behavior may come at a considerable cost, such as industrial production lines, autonomous vehicles, or healthcare applications. Contributions. We frame Out Of Distribution (OOD) detection in DNNs as a statistical hypothesis testing problem. Tests generated within our proposed framework combine evidence from the entire network. Unlike previous OOD detection heuristics, this framework returns a $p$-value for each test sample. It is guaranteed to maintain the Type I Error (T1E - incorrectly predicting OOD for an actual in-distribution sample) for test data. Moreover, this allows to combine several detectors while maintaining the T1E. Building on this framework, we suggest a novel OOD procedure based on low-order statistics. Our method achieves comparable or better results than state-of-the-art methods on well-accepted OOD benchmarks, without retraining the network parameters or assuming prior knowledge on the test distribution -- and at a fraction of the computational cost.
    Causal Feature Selection for Algorithmic Fairness. (arXiv:2006.06053v2 [cs.LG] UPDATED)
    The use of machine learning (ML) in high-stakes societal decisions has encouraged the consideration of fairness throughout the ML lifecycle. Although data integration is one of the primary steps to generate high quality training data, most of the fairness literature ignores this stage. In this work, we consider fairness in the integration component of data management, aiming to identify features that improve prediction without adding any bias to the dataset. We work under the causal interventional fairness paradigm. Without requiring the underlying structural causal model a priori, we propose an approach to identify a sub-collection of features that ensure the fairness of the dataset by performing conditional independence tests between different subsets of features. We use group testing to improve the complexity of the approach. We theoretically prove the correctness of the proposed algorithm to identify features that ensure interventional fairness and show that sub-linear conditional independence tests are sufficient to identify these variables. A detailed empirical evaluation is performed on real-world datasets to demonstrate the efficacy and efficiency of our technique.
    Fast, Accurate and Memory-Efficient Partial Permutation Synchronization. (arXiv:2203.16505v2 [cs.CV] UPDATED)
    Previous partial permutation synchronization (PPS) algorithms, which are commonly used for multi-object matching, often involve computation-intensive and memory-demanding matrix operations. These operations become intractable for large scale structure-from-motion datasets. For pure permutation synchronization, the recent Cycle-Edge Message Passing (CEMP) framework suggests a memory-efficient and fast solution. Here we overcome the restriction of CEMP to compact groups and propose an improved algorithm, CEMP-Partial, for estimating the corruption levels of the observed partial permutations. It allows us to subsequently implement a nonconvex weighted projected power method without the need of spectral initialization. The resulting new PPS algorithm, MatchFAME (Fast, Accurate and Memory-Efficient Matching), only involves sparse matrix operations, and thus enjoys lower time and space complexities in comparison to previous PPS algorithms. We prove that under adversarial corruption, though without additive noise and with certain assumptions, CEMP-Partial is able to exactly classify corrupted and clean partial permutations. We demonstrate the state-of-the-art accuracy, speed and memory efficiency of our method on both synthetic and real datasets.
    Schema matching using Gaussian mixture models with Wasserstein distance. (arXiv:2111.14244v2 [cs.LG] UPDATED)
    Gaussian mixture models find their place as a powerful tool, mostly in the clustering problem, but with proper preparation also in feature extraction, pattern recognition, image segmentation and in general machine learning. When faced with the problem of schema matching, different mixture models computed on different pieces of data can maintain crucial information about the structure of the dataset. In order to measure or compare results from mixture models, the Wasserstein distance can be very useful, however it is not easy to calculate for mixture distributions. In this paper we derive one of possible approximations for the Wasserstein distance between Gaussian mixture models and reduce it to linear problem. Furthermore, application examples concerning real world data are shown.
    Flexible and Efficient Contextual Bandits with Heterogeneous Treatment Effect Oracle. (arXiv:2203.16668v1 [cs.LG])
    Many popular contextual bandit algorithms estimate reward models to inform decision making. However, true rewards can contain action-independent redundancies that are not relevant for decision making and only increase the statistical complexity of accurate estimation. It is sufficient and more data-efficient to estimate the simplest function that explains the reward differences between actions, that is, the heterogeneous treatment effect, commonly understood to be more structured and simpler than the reward. Motivated by this observation, building on recent work on oracle-based algorithms, we design a statistically optimal and computationally efficient algorithm using heterogeneous treatment effect estimation oracles. Our results provide the first universal reduction of contextual bandits to a general-purpose heterogeneous treatment effect estimation method. We show that our approach is more robust to model misspecification than reward estimation methods based on squared error regression oracles. Experimentally, we show the benefits of heterogeneous treatment effect estimation in contextual bandits over reward estimation.
    System Identification via Nuclear Norm Regularization. (arXiv:2203.16673v1 [stat.ML])
    This paper studies the problem of identifying low-order linear systems via Hankel nuclear norm regularization. Hankel regularization encourages the low-rankness of the Hankel matrix, which maps to the low-orderness of the system. We provide novel statistical analysis for this regularization and carefully contrast it with the unregularized ordinary least-squares (OLS) estimator. Our analysis leads to new bounds on estimating the impulse response and the Hankel matrix associated with the linear system. We first design an input excitation and show that Hankel regularization enables one to recover the system using optimal number of observations in the true system order and achieve strong statistical estimation rates. Surprisingly, we demonstrate that the input design indeed matters, by showing that intuitive choices such as i.i.d. Gaussian input leads to provably sub-optimal sample complexity. To better understand the benefits of regularization, we also revisit the OLS estimator. Besides refining existing bounds, we experimentally identify when regularized approach improves over OLS: (1) For low-order systems with slow impulse-response decay, OLS method performs poorly in terms of sample complexity, (2) Hankel matrix returned by regularization has a more clear singular value gap that ease identification of the system order, (3) Hankel regularization is less sensitive to hyperparameter choice. Finally, we establish model selection guarantees through a joint train-validation procedure where we tune the regularization parameter for near-optimal estimation.
    Challenges in leveraging GANs for few-shot data augmentation. (arXiv:2203.16662v1 [stat.ML])
    In this paper, we explore the use of GAN-based few-shot data augmentation as a method to improve few-shot classification performance. We perform an exploration into how a GAN can be fine-tuned for such a task (one of which is in a class-incremental manner), as well as a rigorous empirical investigation into how well these models can perform to improve few-shot classification. We identify issues related to the difficulty of training such generative models under a purely supervised regime with very few examples, as well as issues regarding the evaluation protocols of existing works. We also find that in this regime, classification accuracy is highly sensitive to how the classes of the dataset are randomly split. Therefore, we propose a semi-supervised fine-tuning approach as a more pragmatic way forward to address these problems.
    Spatially Adaptive Online Prediction of Piecewise Regular Functions. (arXiv:2203.16587v1 [math.ST])
    We consider the problem of estimating piecewise regular functions in an online setting, i.e., the data arrive sequentially and at any round our task is to predict the value of the true function at the next revealed point using the available data from past predictions. We propose a suitably modified version of a recently developed online learning algorithm called the sleeping experts aggregation algorithm. We show that this estimator satisfies oracle risk bounds simultaneously for all local regions of the domain. As concrete instantiations of the expert aggregation algorithm proposed here, we study an online mean aggregation and an online linear regression aggregation algorithm where experts correspond to the set of dyadic subrectangles of the domain. The resulting algorithms are near linear time computable in the sample size. We specifically focus on the performance of these online algorithms in the context of estimating piecewise polynomial and bounded variation function classes in the fixed design setup. The simultaneous oracle risk bounds we obtain for these estimators in this context provide new and improved (in certain aspects) guarantees even in the batch setting and are not available for the state of the art batch learning estimators.
    Towards Differential Relational Privacy and its use in Question Answering. (arXiv:2203.16701v1 [cs.LG])
    Memorization of the relation between entities in a dataset can lead to privacy issues when using a trained model for question answering. We introduce Relational Memorization (RM) to understand, quantify and control this phenomenon. While bounding general memorization can have detrimental effects on the performance of a trained model, bounding RM does not prevent effective learning. The difference is most pronounced when the data distribution is long-tailed, with many queries having only few training examples: Impeding general memorization prevents effective learning, while impeding only relational memorization still allows learning general properties of the underlying concepts. We formalize the notion of Relational Privacy (RP) and, inspired by Differential Privacy (DP), we provide a possible definition of Differential Relational Privacy (DrP). These notions can be used to describe and compute bounds on the amount of RM in a trained model. We illustrate Relational Privacy concepts in experiments with large-scale models for Question Answering.
    Wind Farm Layout Optimisation using Set Based Multi-objective Bayesian Optimisation. (arXiv:2203.17065v1 [stat.ML])
    Wind energy is one of the cleanest renewable electricity sources and can help in addressing the challenge of climate change. One of the drawbacks of wind-generated energy is the large space necessary to install a wind farm; this arises from the fact that placing wind turbines in a limited area would hinder their productivity and therefore not be economically convenient. This naturally leads to an optimisation problem, which has three specific challenges: (1) multiple conflicting objectives (2) computationally expensive simulation models and (3) optimisation over design sets instead of design vectors. The first and second challenges can be addressed by using surrogate-assisted e.g.\ Bayesian multi-objective optimisation. However, the traditional Bayesian optimisation cannot be applied as the optimisation function in the problem relies on design sets instead of design vectors. This paper extends the applicability of Bayesian multi-objective optimisation to set based optimisation for solving the wind farm layout problem. We use a set-based kernel in Gaussian process to quantify the correlation between wind farms (with a different number of turbines). The results on the given data set of wind energy and direction clearly show the potential of using set-based Bayesian multi-objective optimisation.
    Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization. (arXiv:2107.12438v3 [math.OC] UPDATED)
    Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.
    An energy-based deep splitting method for the nonlinear filtering problem. (arXiv:2203.17153v1 [stat.CO])
    The main goal of this paper is to approximately solve the nonlinear filtering problem through deep learning. This is achieved by solving the Zakai equation by a deep splitting method, previously developed for approximate solution of (stochastic) partial differential equations. This is combined with an energy-based model for the approximation of functions by a deep neural network. This results in a computationally fast filter that takes observations as input and that does not require re-training when new observations are received. The method is tested on three examples, one linear Gaussian and two nonlinear. The method shows promising performance when benchmarked against the Kalman filter and the bootstrap particle filter.
    Recommender Systems meet Mechanism Design. (arXiv:2110.12558v2 [cs.GT] UPDATED)
    Machine learning has developed a variety of tools for learning and representing high-dimensional distributions with structure. Recent years have also seen big advances in designing multi-item mechanisms. Akin to overfitting, however, these mechanisms can be extremely sensitive to the Bayesian prior that they target, which becomes problematic when that prior is only approximately known. At the same time, even if access to the exact Bayesian prior is given, it is known that optimal or even approximately optimal multi-item mechanisms run into sample, computational, representation and communication intractability barriers. We consider a natural class of multi-item mechanism design problems with very large numbers of items, but where the bidders' value distributions can be well-approximated by a topic model akin to those used in recommendation systems with very large numbers of possible recommendations. We propose a mechanism design framework for this setting, building on a recent robustification framework by Brustle et al., which disentangles the statistical challenge of estimating a multi-dimensional prior from the task of designing a good mechanism for it, and robustifies the performance of the latter against the estimation error of the former. We provide an extension of this framework appropriate for our setting, which allows us to exploit the expressive power of topic models to reduce the effective dimensionality of the mechanism design problem and remove the dependence of its computational, communication and representation complexity on the number of items.
    A Unifying Framework for Reinforcement Learning and Planning. (arXiv:2006.15009v4 [cs.LG] UPDATED)
    Sequential decision making, commonly formalized as optimization of a Markov Decision Process, is a key challenge in artificial intelligence. Two successful approaches to MDP optimization are reinforcement learning and planning, which both largely have their own research communities. However, if both research fields solve the same problem, then we might be able to disentangle the common factors in their solution approaches. Therefore, this paper presents a unifying algorithmic framework for reinforcement learning and planning (FRAP), which identifies underlying dimensions on which MDP planning and learning algorithms have to decide. At the end of the paper, we compare a variety of well-known planning, model-free and model-based RL algorithms along these dimensions. Altogether, the framework may help provide deeper insight in the algorithmic design space of planning and reinforcement learning.

  • Open

    Seeking respondents for a survey about AI text generation and reader interpretation of poetry
    Hello! I am doing an independent (non-academic) research study about AI text generation as relates to poetry and reader interpretation. The results of the study will be presented in a YouTube video. I would really appreciate if some folks could take approximately 20-25 minutes to take this anonymous survey I put together. It involves reading some poems and answering questions about those poems. Thank you so much for the help! https://form.jotform.com/220880249866062 submitted by /u/northern_frog [link] [comments]  ( 1 min )
    The Token-Dropping Approach Used By ML Researchers From Google and NYU Reduces BERT Pretraining Time And Cost By 25%
    The Pretraining of BERT-type large language models, which may scale up to billions of parameters, is essential to achieving best-in-class performance on various natural language processing (NLP) applications. However, the pretraining procedure is costly, and it has become a hurdle for the industrial deployment of big language models. In a research paper, researchers from Google, New York University, and the University of Maryland recommend a simple but effective “token dropping” method that drastically reduces the pretraining cost of transformer models like BERT while maintaining downstream fine-tuning performance. Token dropping is a technique for speeding up the pretraining of transformer models like BERT without sacrificing their performance on downstream tasks. Starting with an intermediate layer in the model, they eliminate uninteresting tokens to let the model focus on key tokens more effectively, given its limited computing resources. The model’s last layer then picks up the dropped tokens, producing full-length sequences. They use the built-in masked language modeling (MLM) loss and its dynamics to detect non-essential tokens with little computing complexity. According to their tests, this straightforward strategy decreases BERT’s pretraining cost by 25% while yielding somewhat higher overall fine-tuning performance on conventional downstream tasks. Continue Reading The Summary Paper: https://arxiv.org/pdf/2203.13240.pdf Github: https://github.com/tensorflow/models/tree/master/official/projects/token\_dropping submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Top emerging artificial intelligence use cases
    submitted by /u/Visionifyai [link] [comments]
    Meta’s new speech AI can laugh, scream, yawn, and chit-chat
    Meta unveils new research on speech AI: Machine-generated voices can now cry, laugh, yawn or make more natural small talk. ... Meta’s speech AI can now mimic emotional sounds such as laughing, yawning, or crying – which it says is important in communication to better convey the intention and context of a statement. ... the new GSML model dGSML, which is optimized for dialogs, generates more natural-sounding audio dialogs using AI agents that can pause for thought or process overlaps in conversations. ... dGSML was trained with about 2000 hours of unlabeled audio dialogues from the Fisher dataset, which contains about 16000 English-language telephone conversations. Source and demos: https://mixed-news.com/en/meta-new-speech-ai-can-laugh-scream-yawn/ submitted by /u/Sephirio [link] [comments]  ( 1 min )
    Oracle Releases MySQL HeatWave ML That Adds Powerful Machine Learning Capabilities to MySQL Applications
    Integrating machine learning capabilities to MySQL systems is prohibitively difficult and time-consuming. The process involves extracting data from the database and into another system to construct and deploy machine learning models. As data flows around, this strategy produces silos for applying machine learning to application data and causes latency. This results in data leakage, making the database more open to security attacks. Moreover, existing machine learning (ML) solutions lack the ability to explain why the model developers build delivers specific predictions. Recently, Oracle released MySQL HeatWave, the only MySQL cloud database service that supports in-database machine learning (ML). It automates the ML lifecycle and saves all trained models in the MySQL database, removing the need to migrate data or models to a machine learning tool or service. This decreases application complexity, saves costs, and increases data and model security. It produces a model with the best algorithm, features, and hyper-parameters for a specific data collection and application. Continue Reading submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    Cloudy World AI Art
    submitted by /u/Recent_Coffee_2551 [link] [comments]
    How Chatbot is beneficial in the Retail Industry?
    By offering your clients a blended virtual, tailored, and on-the-spot conversational level of conversation - you can engage them for longer and create an interactive method of selling to them. Chatbots will be armed with Conversational AI - and powered by an in-depth understanding of the client and their past behaviors - will be able to offer the service which is most likely to suit each client. Retailers can take advantage of the ability to engage potential customers through virtual attendants. Along with this opportunity is a growing range of new and exciting ways to utilize chatbots within retail spaces. A lack of customer engagement through mobile messaging costs retailers around $1 trillion annually, as customers are increasingly becoming reluctant to contact mainstream outlets for recommendations due to their negative experiences, or because they have already made up their mind on what they want. This can be potentially overcome by utilizing Chatbots; in-store systems powered by AI that can be utilized across multiple channels, including the outlet's website and branded social media profiles. submitted by /u/botgo_io [link] [comments]  ( 1 min )
    Game AI Question (Retraining without losing characteristics)
    submitted by /u/ICouldDoButWhyWouldI [link] [comments]  ( 1 min )
    [Disco Diffusion v5] - "Bullet Time of Blood Animation"
    submitted by /u/JoshGrambo [link] [comments]
    If you could change one thing about building ML, what would it be?
    I’d like to open this question up to people who are beginners, intermediates and experienced in the field of ML to get a wide variety of perspectives. If you could change/significantly improve one thing about building ML systems, what would it be? Some examples could be: Reducing the computational overhead Reducing or eliminating the need for large datasets Simplifying the process of constructing models However, it’s not limited to just those three. Curious to see where this goes! submitted by /u/holamyeung [link] [comments]  ( 1 min )
  • Open

    A Gentle Introduction to Decorators in Python
    When working on code, whether we know it or not, we often come across the decorator design pattern. This is […] The post A Gentle Introduction to Decorators in Python appeared first on Machine Learning Mastery.  ( 13 min )
  • Open

    [P] How to add padding to an image in Pytorch?
    Hi guys, I am trying to add padding to images in Pytorch - I need to standardize all the images in my dataset to be of the same size. I spent the whole day trying to find a good solution but nothing worked. I succeeded in resizing but that compromised my image quality, so that is why I want to proceed with padding. How to do this? Thanks in advance! :) submitted by /u/whyhateverything [link] [comments]  ( 1 min )
    [P] SSO (Single Sign-On) for CVAT, the labelling tool
    Hi everyone, For quite a long time, I have seen folks looking for a way to do SSO (Single Sign-On) for CVAT, a popular labeling tool. But unfortunately such capability is not readily available. It only supports local authentication and LDAP. So we decided to make a change proactively, and now we are in the process of enabling SSO for it. The initial result looks promising. Check it out to see what we have done: https://www.youtube.com/watch?v=R7hBBLG5Fdc Is this something that you would love to have? Any other machine learning tools that you want to have SSO capability as well? Any feedback are welcome. submitted by /u/alexcgg1 [link] [comments]  ( 1 min )
    [D] Take Information Theory before the first course in machine learning?
    Hello, I will study further Reinforcement Learning and Deep Learning in the future. I have completed probability theory, linear algebra, and multivariable calculus. I am taking Mathematical Statistics. Should I take Information Theory (IT) before ML? For me, I would definitely take IT, but I don't know whether to take it now or later. submitted by /u/nwe2rw [link] [comments]  ( 1 min )
    [D] Paper Explained - Improving Intrinsic Exploration with Language Abstractions (Full Video Analysis)
    https://youtu.be/NeGJAUSQEJI Exploration is one of the oldest challenges for Reinforcement Learning algorithms, with no clear solution to date. Especially in environments with sparse rewards, agents face significant challenges in deciding which parts of the environment to explore further. Providing intrinsic motivation in form of a pseudo-reward is sometimes used to overcome this challenge, but often relies on hand-crafted heuristics, and can lead to deceptive dead-ends. This paper proposes to use language descriptions of encountered states as a method of assessing novelty. In two procedurally generated environments, they demonstrate the usefulness of language, which is in itself highly concise and abstractive, which lends itself well for this task. ​ OUTLINE: 0:00 - Intro 1:10 - Paper Overview: Language for exploration 5:40 - The MiniGrid & MiniHack environments 7:00 - Annotating states with language 9:05 - Baseline algorithm: AMIGo 12:20 - Adding language to AMIGo 22:55 - Baseline algorithm: NovelD and Random Network Distillation 29:45 - Adding language to NovelD 31:50 - Aren't we just using extra data? 34:55 - Investigating the experimental results 40:45 - Final comments ​ Paper: https://arxiv.org/abs/2202.08938 submitted by /u/ykilcher [link] [comments]  ( 1 min )
    [N] Are we running out of AI benchmarks?
    Benchmarks are an important way to measure progress in AI research – but artificial intelligence is constantly achieving new bests. Are we running out of AI benchmarks? ... Researchers at the Medical University of Vienna and the University of Oxford now show in a meta-study of AI benchmarks that saturated or stagnant benchmarks are common. The researchers examined 1,688 benchmarks with 406 tasks in computer vision and natural language processing since 2013, and draw the following conclusions: In some cases, there would be continuous growth, such as in the ImageNet benchmark. However, a majority of all benchmarks quickly reach technological stagnation or saturation. In some cases, a lack of research interest is also a cause of stagnation. The researchers cite the UCF101 action recognition benchmark as an example of saturation. However, the dynamics of performance improvement do not follow a clearly discernible pattern: in some cases, phases of stagnation are followed by unpredictable leaps. This is what happened in the PROTEINS benchmark. ... Moreover, of the 1,688 benchmarks, only 66 percent have more than three results at different points in time – so in practice, 33 percent of all AI benchmarks are not used and therefore useless. ... In the future, new benchmarks should be developed by large, collaborative teams from many institutions, knowledge domains, and cultures to ensure high-quality benchmarks and avoid fragmentation of the benchmark landscape, the researchers conclude. Source: https://mixed-news.com/en/are-we-running-out-of-ai-benchmarks/ submitted by /u/Sephirio [link] [comments]  ( 1 min )
    [D] Methods for anomaly detection / clustering with high dimensional physics data
    I'm looking for models, workflows, algorithms in the pursuit of principled ways of conducting anomaly detection on high dimensional datasets from physical systems. I am already familiar with the application of autoencoders, isolation forests, etc. to trivial feature sets. I have feature sets that abide physical equations and so there is also the capability of using differential equations or some prior generating process to also bound what is and isn't an "outlier". Looking for papers/methods/texts that are in this vein. submitted by /u/memproc [link] [comments]  ( 2 min )
    Rejecting GAN Off-Manifold Samples? [D]
    I am working on a project, where I do image editing in the latent space of an image. Are there any papers or suggestions on how to enforce that the samples lie on the manifold? submitted by /u/avd4292 [link] [comments]  ( 1 min )
    [R] Cross-lingual Wikipedia dataset
    Hi! For a research project, I am trying to create a dataset that contains: the abstract of an article in EN; the abstract of the article in simple EN; the rest of the article in EN; the rest of the article in simple EN. When I worked in one language, I preprocessed the XML directly (the APIs seemed quite slow for processing the whole encyclopedia). However, I am struggling to find a way to join Wikis in different languages, as the dumps seem not to include a language-independent id. This seems to be a relatively "standard" task for creating cross-lingual datasets, so I hope someone has some tips, and I do not need to spend the next week reinventing the wheel :) submitted by /u/ombelicoInfinito [link] [comments]  ( 1 min )
    [D] Activation functions for Neural Networks in Time Series
    Hi everyone, I want to run a feedforward autoregressive network to forecast the potential sales of specific SKUs for the following months. The idea is to capture the non-linearity in the data. Does it make sense to use ReLU? or given that all my data points are positive values this function will return the same number (max(0,x)), therefore is not suitable for what I am trying to do? I have also checked other activation functions, but sigmoid for example is for classification, and hyperbolic tangent returns values that can be negative. ​ Any help would be much appreciated Thanks! submitted by /u/Old-Box-6684 [link] [comments]  ( 1 min )
    [D] I just watched AlphaGo - The Movie, are there any more well made documentaries about AI available?
    AlphaGo - The Movie on Youtube submitted by /u/Mighty__hammer [link] [comments]  ( 1 min )
    [P] Cyclemoid -- a new activation function inspired by cyclical learning rates; SOTA on several benchmarks
    Excited to share our latest research. The cyclemoid activation was inspired by the success of cyclical learning rate. Moreover, it has nice mathematical properties to stabilize gradients and maintain strong gradient signals in desired regions during training. We designed it as a drop-in replacement for ReLU, and we would love to hear what you think. The code is up on GitHub, and the preprint should be up soon, too: https://github.com/rasbt/cyclemoid-pytorch PS: Currently, we only have a PyTorch implementation but would welcome it if someone could port it to TensorFlow/Keras (my Tf/Keras skills are just too rusty for it.) submitted by /u/seraschka [link] [comments]  ( 2 min )
    [D] I'm creating a tool to enrich your datasets with relevant external data
    Hey all, I love doing market research and all kinds of exploratory analyses, but getting the data is a major pain point, as it is in many places (data dumps, apis, marketplaces, web data) and in all kinds of formats I'm trying a different approach, where instead of searching for data sources, and then integrating manually, you just upload your dataset. My service has a large index with datasets and api providers, and finds relevant ones for your dataset which you can add easily. ​ example search via sdk Does this seem useful to you? Would love to hear your thoughts submitted by /u/salmiakdrop [link] [comments]  ( 2 min )
    [D] Have there been successful applications of Deep RL to real problems other than board games/Atari?
    Some successful applications I am able to gather, mostly from Deepmind: - comma.ai self-driving system. - Weather nowcasting - Tokamak fision reactor feeedback control - Hardware design: New gen. TPU - Datacenter cooling optimization - Adaptive locomotion for quadrupedal robots * - Portfolio optimization (financial instruments) There is a lot of work in games, particularly board games, but these do not really solve something "useful" for society. I have seen also lots of toy examples with libraries like gym and some robotics but in general these are rather proof-of-concept models or just models that do not work at all. One that actually does work is Solving Rubik’s Cube with a Robot Hand, not regarding the solution of the cube but its dexterous manipulation with a robotic hand. This is pretty cool, but again, the domain of the problem is too narrow to be considered actually a successful application to a real-world problem. So my question is, am I missing some examples? For example, is any company out there trying to apply deep RL to self-driving vehicles or to NLP, and have they had any success? * Boston dynamics solves this without ML, just good'ol control theory so this is a 50-50 win for RL. ​ Edit: Thanks everyone for the responses, I have updated the list with more projects from the comments. Edit 2: Took Alphafold out because the current version (2.x) does not use RL. submitted by /u/sid_276 [link] [comments]  ( 4 min )
    [D] Request for Advice: Text-to-Image Synthesis
    Hello all, I hope that you are all doing good. The thing is that I want to study text-to-image synthesis; but I see that there are lots of work already done up until now. I am kind of confused about how I should start studying. About my experience, I am trying to learn as much as I can. I started by following Andrew Ng’s Machine Learning course on Coursera; and now I continue with Deep Learning Specialization (almost finished). Apart from these, I check NVIDIA blog, some YouTube channels, Slack communities and of course here on Reddit along with a few other channels. As I said, I am trying to follow what’s going on. To tell the truth, I didn’t do much in terms of building models. I mean, not a real-life project or something like that. My experience is more based on projects for the courses that I attend/attended. By the way, I don’t know how it will affect things’ turning out; but I have recently begun a graduate program as well in artificial intelligence. Moreover, during my undergraduate program, I took some courses that might help (at least I guess so) including Calculus and Linear Algebra. I also want to mention that as for hardware there is an NVIDIA Jetson Nano Developer Kit available to me. I hope that what I am asking is clear and information I provided help you answering. If not, please ask me. Best regards. submitted by /u/fgokmenoglu [link] [comments]  ( 1 min )
    [D] Do you use data engineering pipelines for real life projects?
    Usually I am working on huge and complex datasets with millions of rows (or images if it's CV) and most often I just feed them to pandas in a notebook, then transfer the code to a script and run it when it's needed. Then with the result I train my models. No external tools used for this. Do you have experience with data pipeline tools/frameworks and data validation tools/frameworks? For example I just found "Great Expectations" and "Kedro", "Flyte" and I was wondering at which point in time and project complexity should we choose one of these tools instead of the ancient cave man way? Any success/failure stories? submitted by /u/gabegabe6 [link] [comments]  ( 5 min )
    [D] The Joy of Finding Things Out (essay)
    Currently I’m trying to figure out if I want to stay in academia or not. My decision hinges on what makes doing science enjoyable. My current understanding is in order for science to be enjoyable, there has to be an element of surprise. A tension that builds. And release of that tension. This is most obvious in theoretical works where the scientist makes a prediction, and later empirical data verifies that prediction. Excitement. Joy. Wonder. In experimental work, surprise can take the form of not knowing how the experiment will turn out. Once you get the result of the experiment, it'll disambiguate competing theories you had in your mind or elucidate a new theory. "Everything clicks" or at the very least you'll be put into a fever trying to integrate the new surprising data with previous …  ( 10 min )
    [D] A fine grained classification dilemma
    Hi Folks, I'm in a bit of a dilemma here on a specific fine grained classification problem that we have. The task at hand is to classify the subspecies of plant seeds. Firstly, the seeds are super tiny, but we have camera setup to take zoomed images of the seeds. Secondly, even the experts can't tell the exact difference between 2 subspecies of seeds. They go by their experience and intuition. All the subspecies put together looks similar but the data points within the same subspecies are vastly different. The requirement being classifying the subspecies based on their morphological properties. Here is the catch, the requirement is 98%+ accuracy. I tried few preprocessing and models (transformers, resnets, inception and other sort) but can't hit the 98% accuracy mark. Even if I could, the model sometimes fail on external sets or on production. I would like some expert (ML side) take on this issue on how to approach this. submitted by /u/happy_happy_feet [link] [comments]  ( 4 min )
    [D] how to preprocess a 13k column dataset.
    I have a single cell rna seq dataset containing 13k features. I would like to preprocess the dataset. What are the best methods to do that? Also, how to apply feature elimination/selection on this unsupervised data? Thanks submitted by /u/Striking-Machine2763 [link] [comments]  ( 1 min )
    [R] Nix-TTS 🐤: An incredibly lightweight text-to-speech via non end-to-end distillation
    Hi, Reddit! Excited to share with you guys, Nix-TTS 🐤! Our latest research in lightweight neural Text-to-Speech. We've seen how synthetic voices generated by recent neural TTS are getting more and more natural, but most of the time the models suffer from slow CPU inference and are not end-to-end, which requires an additional vocoder model. Nix-TTS 🐤 is an incredibly lightweight end-to-end TTS model achieved by applying non end-to-end knowledge distillation to a powerful yet large-sized generative TTS teacher model. Our proposed model is end-to-end (vocoder-free) with only 5.23M parameters or up to 82% reduction of the teacher model. We also employed a stochastic duration predictor to improve its expressiveness. It is capable to run 10x faster than real-time on Intel i7 CPU and 0.5 times faster than real-time on Raspberry Pi Model 3B. Making it suitable for deployment in resource-constrained settings. Here we attached the complexity and speedup detail from the paper. ​ Nix-TTS speedup and complexity compared to other models. ​ We released the paper (submitted to INTERSPEECH 2022) and the pre-trained models on the attached link below: 📄 Paper: https://arxiv.org/abs/2203.15643 📦 Repository: https://github.com/rendchevi/nix-tts 🤗 Interactive Demo: https://huggingface.co/spaces/rendchevi/nix-tts A short video demo from the 🤗 HuggingFace Spaces: Nix-TTS Short Demo Let me know what you guys think in the thread! We're very excited to see the potential improvements & applications of this model or method and lightweight TTS in general. Feel free to reach me via DM as well if you'd like to discuss anything further. submitted by /u/sourpeach_ [link] [comments]  ( 2 min )
  • Open

    [D] Current algorithms consistently outperforming SAC and PPO
    Hi community. It has been 5 years now since these algorithms were released, and I don't feel like they have been quite replaced yet. In your opinion, do we currently have algorithms that make either of them obsolete in 2022? submitted by /u/yannbouteiller [link] [comments]  ( 1 min )
    Need Help with Project Idea
    Hey guys! So I am enrolled in a reinforcement learning course at my university, and I am really confused about a decent project idea. Primarily, I want to work on any game based environment apart from atari ones. Using unity seems promising but not sure if that is easy to pull off. Any suggestions to get me started? Thanks submitted by /u/ishon_p [link] [comments]  ( 1 min )
    [D] Completely removing the option taking illegal actions in custom gym environment
    Hi, I have a two step custom gym env which is a graph network optimisation task (two steps due to high dimension action space). In the first step the agent chooses the first node and state reward passed to training. in the second step, the agent chooses the second node, and with a class attribute holding the history of the first selected node, now has a node pair which it is then able to remove or add and edge. The training loop now has a new graph state (edges changed between nodes) plus a vector of the first action selected. The agent is able to learn well however, even with a negative reward provided to the agent if it takes illegal actions (choosing the same node in each step and thus creating a self connection), even when it learns to maximise reward, it still take illegal actions…  ( 2 min )
    Training drone via DRL to hover with just camera sensor
    Hi Everyone This is my first time trying reinforcement learning and Unity and wanted some help. I am trying to train a quadcopter to hover and keep a stationary ball in its camera frame using just visual inputs by a camera sensor. So the input to the network would be an 84*84 grayscale image and the output would be the four forces to the rotor. My reward function is ​ Reward Function where x and y is the position of the drone and a is the rotation of the drone with respect to the target. I have set A, d and c to 3 and lambda as 1/180. I have also added a condition where if the quadcopter drops below a certain height from the platform, it punishes it by a reward of -1 and resets the episode. The network I am using is the ppop network used by the coin-collector example in mlagents. My training log is below: Training log The cumulative reward and episode length just drops after a while and the value loss explodes. I think something maybe wrong with my rewards or network. If anyone has any ideas what might be going wrong and tell me, that would be great. Thanks submitted by /u/voyager10 [link] [comments]  ( 1 min )
    Do you know any port of StableBaselines 3 to C++?
    Has anybody done it already? submitted by /u/Live_Medium_3949 [link] [comments]
    Is there a way to get PPO controlled agents to move a little more gracefully?
    submitted by /u/user_00000000000001 [link] [comments]  ( 2 min )
    How to make two policies in TRPO, PPO algorithms?
    In both TRPO and PPO, we have r which is ration of new_policy/old_policy. Here we collect data from old_policy and improve new policy. I am confused on how to implement this. Correct me If I am wrong but I have two ways in mind. 1) I collect prob while running simulation. When optimizing, I use same neural network to sample another action and its prob, this new and its probability becomes my new_policy and then I optimize for L_clip function. 2) I collect prob while running simulation. Before optimizing the ppo objective, I first run a simple policy gradient by using the original prob I collected. After updating the NN, I once more get new_prob which I use in L_clip function. Can someone please tell me which should I do and why? submitted by /u/Better-Ad8608 [link] [comments]  ( 1 min )
    Intrinsic Curiosity Module Pytorch multithreading cpu unable to fix seeds
    Hello I am working on an extension of this implementation https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning/ICM of the intrinsic curiosity module. It uses A3C(Actor -critic) as a policy and the ICM is a bolt on module. I need to fix the seeds for reproducibility but no matter what i have tried I cannot achieve it. The implementation uses multithreading on cpu and plays on the oepnai gym cartpole or atari environments. I believe that it has something to do with multithreading but im not sure. Could someone know what is the solution? submitted by /u/Formal-Drawing-8421 [link] [comments]  ( 1 min )
    Is replay buffer can remove "done"?
    Hi, These days, there are lots of implementation without next state and done for memory like drq-v2 official implementation. But, I have a question about is it okay to throw out "done" in replay buffer. In my point of view, there are some problems about done related signal. or did I read implementation code wrong? submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 1 min )
    Do policy gradient methods also require some mechanism for exploration?
    Algorithms like A2C, A3C, TRPO and PPO use a stochastic policy, i.e. the actions are sampled from a probability distribution so the exploration should be done inherently by these algorithms. Yet, when I am using PPO to train a bipedalwalker, it seems like some exploration mechanism is required because during the training process reward first go up and then after 1K episodes there is no progress. Please suggest me what can I do stop this from happening. submitted by /u/Better-Ad8608 [link] [comments]  ( 2 min )
  • Open

    Generating new molecules with graph grammar
    An efficient machine-learning method uses chemical knowledge to create a learnable grammar with production rules to build synthesizable monomers and polymers.  ( 6 min )
  • Open

    Whitepaper: Machine Learning Best Practices in Healthcare and Life Sciences
    For customers looking to implement a GxP-compliant environment on AWS for artificial intelligence (AI) and machine learning (ML) systems, we have released a new whitepaper: Machine Learning Best Practices in Healthcare and Life Sciences. This whitepaper provides an overview of security and good ML compliance practices and guidance on building GxP-regulated AI/ML systems using AWS […]  ( 3 min )
  • Open

    Introducing CVSS: A Massively Multilingual Speech-to-Speech Translation Corpus
    Posted by Ye Jia and Michelle Tadmor Ramanovich, Software Engineers, Google Research Automatic translation of speech from one language to speech in another language, called speech-to-speech translation (S2ST), is important for breaking down the communication barriers between people speaking different languages. Conventionally, automatic S2ST systems are built with a cascade of automatic speech recognition (ASR), text-to-text machine translation (MT), and text-to-speech (TTS) synthesis sub-systems, so that the system overall is text-centric. Recently, work on S2ST that doesn’t rely on intermediate text representation is emerging, such as end-to-end direct S2ST (e.g., Translatotron) and cascade S2ST based on learned discrete representations of speech (e.g., Tjandra et al.). While early vers…  ( 9 min )
  • Open

    AI-generated pranks for your computer to play on you
    I've tried various methods of using AI to generate April Fools pranks for you to play on other people (although often they turned out to be pranks you play on yourself). But this is the first time I've tried to generate pranks for a computer to  ( 4 min )
    Bonus: Ada's pranks
    AI Weirdness: the strange side of machine learning  ( 1 min )

  • Open

    [D]Are there any good solutions for multimodal classification? Libraries, AutoML tool?
    Hi, Reddit! I'm a data scientist working in the dating app startup field. We have been trying to set up people with blind dates. We can get multimodal data(texts, images, and audio) and we want to do this: collect negative and positive pairs from each user's swiping history and do a binary classification (matching). Then we would get together as many users' data as possible and train a model. Though it sounds nice, this is hard as we could not find an existing developer tool or paper that supports combining such rich multimodal data. Any existing research out there to achieve this task? Moreover, is there already any library or AutoML tool that supports this? Checked Google AutoML, does not support this. Any help and advice would be much appreciated. submitted by /u/meame2010 [link] [comments]  ( 1 min )
    [P] LAION-5B: public dataset of 5.85 billion image-text pairs
    LAION-5B: A new era of open large-scale multi-modal datasets. Twitter thread. Related: [P] LAION-400M: open-source dataset of 400 million image-text pairs. I am not affiliated with this project. submitted by /u/Wiskkey [link] [comments]
    [P] Adapting pixel attribution methods for models that output embeddings
    Hi r/MachineLearning, https://github.com/jacobgil/pytorch-grad-cam is a project that has a comprehensive collection of Pixel Attribution Methods for PyTorch (like the package name grad-cam that was the original algorithm implemented). Typically pixel attribution methods are adapted for classification: they let you understand what part of the image correspond to a certain classification category. However some deep learning models output embeddings instead of category scores.You can then match these embeddings against other embeddings and measure their similarity. For example: in face recognition models, or in self supervised networks. In this case to apply pixel attribution, we could create embeddings of concepts, and then for new query images we would be asking: "what parts of the image have feature representations that match the concept features?" Or in other words: "where in the query image do we see the concepts?" ​ I wrote a tutorial that shows how to use the pytorch-grad-cam project, to adapt pixel attribution for the embedding case, and visualize where different concept feature representations match the image: https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/Pixel%20Attribution%20for%20embeddings.ipynb An example is the image below. The two left images are "concept" images of clouds, and a car. Then given a new query image, we can try to see where in the image do we feature representations that match these concepts. Given images of concepts and a query image, attribute what parts of the query image match the concepts I hope someone finds this useful ! submitted by /u/jacobgil [link] [comments]  ( 1 min )
    [D] Does anyone know how to create animations like in the Google AI Blog?
    I really would like to do some visualization of my ideas. I found the animation in the google ai blog: https://ai.googleblog.com/2022/02/4d-net-learning-multi-modal-alignment.html Anyone knows how to do this stuff, especially with the flowing lines? Any software suggestions? submitted by /u/KonArtist01 [link] [comments]  ( 1 min )
    [D] Data centric fixes to a model that's fit to a spurious correlation
    Say a computer vision model learns a pattern you don't want it to, you know that it's learnt it because of analysis through tools like occlusion sensitivity map. What data-centric techniques can you use to resolve it? Could some form of cropping augmentation do the trick?Classic example is ruler beside melanoma, while there may be a correlation between presence of ruler and presence of melanoma you don't want to the model to depend on that information because it may not exist 'in production'. Below is a quotation describing another similar problem. "In another paper a similar issue was found because doctors sometimes use purple markers to highlight potentially-malignant skin cancers for easier examination. Some argue that the purple marks are a real signal that should be incorporated in the model just as the visual appearance of the tumor itself is incorporated. However, if your goal is robust generalizability over time it is probably best to not have your AI incorporate the human applied purple marks as signal, as the standards for applying those marks may vary across teams and across time." https://menloml.com/2020/01/11/recognizing-a-ruler-instead-of-a-cancer/ f you're working with that dataset, what tools are available to you to solve that problem? submitted by /u/Georgehwp [link] [comments]  ( 1 min )
    [P] Marketing Mix Modeling - How can we solve for negative Media Coefficients?
    Hi everyone I'm working on a Marketing Mix Modeling project for a client I'm using Python and sci-kit learn library to do regression analysis with Ridge and Linear Regression. I have pretty good results: R^2=0.87 mape= 0.2 But some of my media coefficients are negative And this doesn't make sense business-wise How can I model positive media coefficients without using Bayesian modeling? submitted by /u/datagabriele [link] [comments]  ( 3 min )
    [R] projUNN: efficient method for training deep networks with unitary matrices
    Paper: https://arxiv.org/abs/2203.05483 TL;DR; from LeCun: Recurrent nets in which the weight matrix is unitary are interesting beasts: they are invertible, they don't suffer from vanishing/exploding gradient, and they perform computation akin to what happens in quantum computers. training them can be difficult or expensive. We propose a low-rank (low-cost) update method to update unitary weight matrices with gradient descent. Abstract: In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates -- or their rank-k approximation -- that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN^2). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k=1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. By integrating our projUNN algorithm into both recurrent and convolutional neural networks, our models can closely match or exceed benchmarked results from state-of-the-art algorithms. submitted by /u/lostmsu [link] [comments]  ( 1 min )
    [D] Are text embeddings tabular data?
    I keep hearing that NNs are not the best way to approach tabular data. But when it comes to document classification, in terms of using embeddings for a downstream classification task, would that be considered tabular data? You will end up with data that fits in a table that you wish to classify...it's high dimensional but you could reduce dimensions until you end up with just a smaller set of columns and the labels. I guess I'm unclear about what defines tabular data in this context, and if it makes sense to use a different model (like XBGboost) for the classification task, vs having it as a final layer in the embedding network. submitted by /u/bandalorian [link] [comments]  ( 3 min )
  • Open

    Prepare data from Databricks for machine learning using Amazon SageMaker Data Wrangler
    Data science and data engineering teams spend a significant portion of their time in the data preparation phase of a machine learning (ML) lifecycle performing data selection, cleaning, and transformation steps. It’s a necessary and important step of any ML workflow in order to generate meaningful insights and predictions, because bad or low-quality data greatly […]  ( 9 min )
  • Open

    The Myth of Analytic Talent Shortage
    I tested the job market in the last two weeks, both as an applicant, and as a hiring manager. I share my experience here. It is radically different from what you read in the news, or from what most people say. Data scientists and machine learning engineers looking for a new job are out there.… Read More »The Myth of Analytic Talent Shortage The post The Myth of Analytic Talent Shortage appeared first on Data Science Central.  ( 6 min )
  • Open

    Policy Gradients with pytorch
    what will the shape of the tensor "probs" (marked) look like ? will it look like this - [ 0.32,0.40,0.28] ? ​ or like this ? [ [ 0.32,0.40,0.28], [ 0.32,0.40,0.28], [ 0.32,0.40,0.28], [ 0.32,0.40,0.28] ] https://preview.redd.it/3qiq3lzz6sq81.png?width=829&format=png&auto=webp&s=c15ece0e38b9dabf3f443466b2553e146845e92a submitted by /u/Whole_Run_4485 [link] [comments]  ( 1 min )
    What are the main roads to AGI?
    I was wondering if you can help me come up with a list fo all the specific proposals on how to achieve AGI. For example, one of them is: scaling is all you need. In more detail, scaling self-supervised pretrained deep network models (a.k.a. foundation models), data and compute can lead to AGI (scaling assumes "smart" one, i.e. as steep as possible/cost-efficient exponents in neural scaling laws). Do you know what other main roads there are to AGI? submitted by /u/No_Possibility_7588 [link] [comments]  ( 1 min )
    Sparse Reward Environments and Off Policy Algorithms
    In my experience, (continuous action space) off Policy algorithms are generally more sample efficient than on policy algorithms but don't perform as well on sparse rewards environments. Are there any papers that address this issue? Do you know of any algorithms that are both sample efficient and learn sparse reward Environments well? submitted by /u/SirRantcelot [link] [comments]  ( 1 min )
    How to deal with delayed, dense rewards
    I'm having a doubt that may be a little stupid, but I ask to be sure. Assume that in my environment rewards are delayed by a random number n of steps, i.e. the agent takes an action but receives the reward n steps after taking that action. At every step a reward is produced, therefore the reward r_t in transitions s_t, a_t, r_t, s_{t+1} collected by the agent is actually the reward corresponding to the transition at time t-n. An example scenario: the RL agent control a transportation network, and a reward is generated only when a package reach its destination. Thus, the reward arrives with possibly several steps of delay with respect to when the relevant actions were taken. Now, I know that delayed rewards are not generally an issue, e.g. all those settings in which there is only one reward +1 at the end, but I am wondering if this case is equivalent. What makes me wonder is that here, for a state s_t onwards to state s_{t+n}, there are n rewards in the middle that depend on states previous to s_t. Does this make the problem non-markovian? How can one learn the value function V(s_t) if its estimation is always affected by unrelated rewards r_{t-n} ... r_{t-1}? submitted by /u/fedetask [link] [comments]  ( 2 min )
    Sim2Real
    Hi! Does anyone know the “right” way to apply a policy to a robotic manipulator? Now I’m trying creating a real environment and simulate the policy on it but I can’t find anything on the web about this. Thanks! submitted by /u/Big-Picture8323 [link] [comments]  ( 1 min )
  • Open

    Featured video: L. Rafael Reif on the power of education
    At Monterrey Tec, MIT’s president discusses the impact of education in addressing global issues.  ( 3 min )
  • Open

    Last Week in AI podcast: DeepMind Mafia, DishBrain, PRIME, ZooKeeper AI, Instant NeRF
    submitted by /u/regalalgorithm [link] [comments]
    Newbie question
    Hello guys, I was wondering if anyone knew the easiest way to combine images together. Ideally I would have a bunch of images and it would take components of a couple (or just two) and put them together. I want to generate images of morphed anime figures, it doesn’t even need to look professional (or good lol). Just need some sort of website or software that I can easily achieve this. Any tips or ideas would be greatly appreciated!! Thank you! submitted by /u/misakimeifanpage [link] [comments]  ( 1 min )
    Fighting AI's discrimination in mortgage lending | DualFair
    submitted by /u/qptbook [link] [comments]
    Researchers from U Texas and Apple Propose a Novel Transformer-Based Architecture for Global Multi-Object Tracking
    Multi-object tracking aims to locate and track all objects in a video feed. It’s a fundamental component in domains like mobile robots, where an autonomous system must navigate dynamic surroundings populated by other mobile agents. Thanks to breakthroughs in deep learning and object detection, tracking-by-detection has become the dominant tracking paradigm in recent years. Tracking-by-detection simplifies the process by reducing it to just two steps: detection and association. First, an object detector searches each video stream frame for probable items. The second phase is an association step, which connects detections over time. Local trackers are greedy when it comes to pairwise relationships. They keep track of each trajectory’s state based on its position and/or identity traits and correlate current-frame detections with it based on its last visible status. Continue Reading The Research Summary Paper: https://arxiv.org/pdf/2203.13250.pdf Github: https://github.com/xingyizhou/GTR https://i.redd.it/312ahhaxtqq81.gif submitted by /u/No_Coffee_4638 [link] [comments]  ( 1 min )
    AI generated Personalized Implicit Neural Avatars (PINA)
    submitted by /u/imapurplemango [link] [comments]
    Instant NeRF: Turn 2D Images into a 3D Models in Milliseconds
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 1 min )
  • Open

    Jigsaw fixes bugs in machine-written software
    Large pre-trained language models such as GPT-3, Codex, and others can be tuned to generate code from natural language specifications of programmer intent. Such automated models have the potential to improve productivity for every programmer in the world. But since the models can struggle to understand program semantics, the quality of the resulting code can’t […] The post Jigsaw fixes bugs in machine-written software appeared first on Microsoft Research.  ( 8 min )
2022-04-30T01:01:35.533Z osmosfeed 1.14.4